Why ClickHouse® Queries Are Slow

ClickHouse® has a reputation: it’s supposed to be fast.

And it is – but only under certain conditions.

A common mistake is assuming that if a query is slow, the dataset is too large or the hardware isn’t good enough. In most cases, neither is true. Slow queries in ClickHouse® usually come down to how the data is stored and how the query interacts with that storage.

This blog is not a general overview. It’s a breakdown of the issues you’ll likely run into when queries are slower than expected, along with what to check and how to fix them.

1. Start With This Question: What Is ClickHouse® Actually Scanning?

Before changing anything, run:

EXPLAIN indexes = 1
SELECT ...

Or check:

SELECT *
FROM system.query_log
WHERE query = 'your_query'
ORDER BY event_time DESC
LIMIT 1;

Look at:

read_rows
read_bytes

If ClickHouse® is reading most of the table, the problem is not your SQL syntax. It’s that the engine cannot skip data efficiently.

Everything below ties back to this.

2. Mismatch Between ORDER BY and Your Filters

This is the most common issue.

Example table:

ENGINE = MergeTree()
ORDER BY (user_id)

Typical query:

SELECT count(*)
FROM events
WHERE timestamp >= now() - INTERVAL 1 DAY;

What actually happens?

Data is sorted by user_id, but your filter is on timestamp.

ClickHouse® cannot use the sort order to skip data, so it scans almost everything.

What to check:

Run:

EXPLAIN indexes = 1
SELECT count(*)
FROM events
WHERE timestamp >= now() - INTERVAL 1 DAY;

If you see that most granules are selected, your ORDER BY is not helping.

Align ORDER BY with your most frequent filters:

ORDER BY (timestamp, user_id)

If you filter by multiple dimensions, the order matters. The first column should be the most selective or most commonly filtered.

3. Too Many Parts (Usually Caused by Small Inserts)

If query performance degrades over time, check this:

SELECT
    table,
    count() AS parts,
    sum(rows) AS total_rows
FROM system.parts
WHERE active = 1
GROUP BY table
ORDER BY parts DESC;

What this tells you

If a table has thousands of active parts, queries will slow down because ClickHouse® has to read from many files.

Why it happens

Frequent small inserts
Streaming data without batching

Each insert creates a new part. Merges happen in the background, but they can’t always keep up.

Fix Batch inserts.

Instead of inserting rows individually, send data in chunks (e.g., 10k–100k rows per insert).

If you’re using a pipeline (Kafka, etc.), make sure batching is configured properly.

4. Misuse of PARTITION BY

Partitioning is often misunderstood.

Example:

PARTITION BY toDate(timestamp)

At first, this looks reasonable. But it creates one partition per day.

If your data spans a year, that’s 365 partitions. Queries that don’t restrict partitions will touch many of them.

What to check

SELECT
    partition,
    count()
FROM system.parts
WHERE table = 'events'
GROUP BY partition
ORDER BY partition;

Fix

Use coarser partitions:

PARTITION BY toYYYYMM(timestamp)

Partitioning is for data management, not primary query optimization. ORDER BY matters more for query speed.

5. FINAL Is Slower Than You Think

If you’re using engines like ReplacingMergeTree and running queries like:

SELECT * FROM events FINAL;

You’re forcing ClickHouse® to merge data during query execution.

What to check

Compare:

SELECT count(*) FROM events;
SELECT count(*) FROM events FINAL;

If there’s a significant difference in execution time, FINAL is the reason.

When FINAL is actually needed

Deduplication at query time
Reading the latest version of rows

Alternatives

Accept eventual consistency
Use materialized views to maintain cleaned data
Run periodic OPTIMIZE (carefully)

6. Recomputing the Same Aggregations Repeatedly

If you run queries like:

SELECT user_id, count(*)
FROM events
GROUP BY user_id;

on a large dataset, ClickHouse® will scan and aggregate every time.

What to check

Look at read_rows in the system.query_log. If it’s consistently high for repeated queries, you’re recomputing too much.

Fix

Use projections and Materialized Views.

This shifts the cost from query time to ingestion time.

7. Joins Are Not Free

ClickHouse® supports joins, but they are not its strength.

Common mistake

Normalizing data and joining at query time:

SELECT e.*, u.country
FROM events e
JOIN users u ON e.user_id = u.id;

Problem

Joins can:

Increase memory usage
Slow down queries significantly

Fix:

Denormalize where possible.

Store frequently used fields together, even if it means duplicating data.

8. You’re Not Looking at system Tables

ClickHouse® gives you internal visibility. Use it.

Useful tables:

system.query_log → what queries are doing
system.parts → storage layout
system.merges → background activity

If you’re debugging performance without these, you’re guessing.

Closing Thought

ClickHouse® performance is predictable once you understand one thing:

It is a system designed to scan less data, not magically process everything faster.

If it reads less, it’s fast.
If it reads more, it’s slow.

So when a query is slow, don’t start by rewriting the query.
Start by asking:

Why is ClickHouse® reading so much data?

That question usually leads you directly to the problem.

References

https://clickhouse.com/docs/sql-reference/statements/select/order-by

https://clickhouse.com/docs/knowledgebase/optimize_final_vs_final

https://clickhouse.com/blog/clickhouse-debugging-issues-with-system-tables

Post Views: 51

Quantrail Data

When ClickHouse® Queries Are Slow: A Practical Debugging Guide

1. Start With This Question: What Is ClickHouse® Actually Scanning?

2. Mismatch Between ORDER BY and Your Filters

3. Too Many Parts (Usually Caused by Small Inserts)

4. Misuse of PARTITION BY

5. FINAL Is Slower Than You Think

6. Recomputing the Same Aggregations Repeatedly

7. Joins Are Not Free

8. You’re Not Looking at system Tables

Closing Thought

References

Quantrail Data