ClickHouse® has a reputation: it’s supposed to be fast.
And it is – but only under certain conditions.
A common mistake is assuming that if a query is slow, the dataset is too large or the hardware isn’t good enough. In most cases, neither is true. Slow queries in ClickHouse® usually come down to how the data is stored and how the query interacts with that storage.
This blog is not a general overview. It’s a breakdown of the issues you’ll likely run into when queries are slower than expected, along with what to check and how to fix them.
1. Start With This Question: What Is ClickHouse® Actually Scanning?
Before changing anything, run:
EXPLAIN indexes = 1
SELECT ...
Or check:
SELECT *
FROM system.query_log
WHERE query = 'your_query'
ORDER BY event_time DESC
LIMIT 1;
Look at:
read_rowsread_bytes
If ClickHouse® is reading most of the table, the problem is not your SQL syntax. It’s that the engine cannot skip data efficiently.
Everything below ties back to this.
2. Mismatch Between ORDER BY and Your Filters
This is the most common issue.
Example table:
ENGINE = MergeTree()
ORDER BY (user_id)
Typical query:
SELECT count(*)
FROM events
WHERE timestamp >= now() - INTERVAL 1 DAY;
What actually happens?
Data is sorted by user_id, but your filter is on timestamp.
ClickHouse® cannot use the sort order to skip data, so it scans almost everything.
What to check:
Run:
EXPLAIN indexes = 1
SELECT count(*)
FROM events
WHERE timestamp >= now() - INTERVAL 1 DAY;
If you see that most granules are selected, your ORDER BY is not helping.
Align ORDER BY with your most frequent filters:
ORDER BY (timestamp, user_id)
If you filter by multiple dimensions, the order matters. The first column should be the most selective or most commonly filtered.
3. Too Many Parts (Usually Caused by Small Inserts)
If query performance degrades over time, check this:
SELECT
table,
count() AS parts,
sum(rows) AS total_rows
FROM system.parts
WHERE active = 1
GROUP BY table
ORDER BY parts DESC;
What this tells you
If a table has thousands of active parts, queries will slow down because ClickHouse® has to read from many files.
Why it happens
- Frequent small inserts
- Streaming data without batching
Each insert creates a new part. Merges happen in the background, but they can’t always keep up.
Fix Batch inserts.
Instead of inserting rows individually, send data in chunks (e.g., 10k–100k rows per insert).
If you’re using a pipeline (Kafka, etc.), make sure batching is configured properly.
4. Misuse of PARTITION BY
Partitioning is often misunderstood.
Example:
PARTITION BY toDate(timestamp)
At first, this looks reasonable. But it creates one partition per day.
If your data spans a year, that’s 365 partitions. Queries that don’t restrict partitions will touch many of them.
What to check
SELECT
partition,
count()
FROM system.parts
WHERE table = 'events'
GROUP BY partition
ORDER BY partition;
Fix
Use coarser partitions:
PARTITION BY toYYYYMM(timestamp)
Partitioning is for data management, not primary query optimization. ORDER BY matters more for query speed.
5. FINAL Is Slower Than You Think
If you’re using engines like ReplacingMergeTree and running queries like:
SELECT * FROM events FINAL;
You’re forcing ClickHouse® to merge data during query execution.
What to check
Compare:
SELECT count(*) FROM events;
SELECT count(*) FROM events FINAL;
If there’s a significant difference in execution time, FINAL is the reason.
When FINAL is actually needed
- Deduplication at query time
- Reading the latest version of rows
Alternatives
- Accept eventual consistency
- Use materialized views to maintain cleaned data
- Run periodic OPTIMIZE (carefully)
6. Recomputing the Same Aggregations Repeatedly
If you run queries like:
SELECT user_id, count(*)
FROM events
GROUP BY user_id;
on a large dataset, ClickHouse® will scan and aggregate every time.
What to check
Look at read_rows in the system.query_log. If it’s consistently high for repeated queries, you’re recomputing too much.
Fix
Use projections and Materialized Views.
This shifts the cost from query time to ingestion time.
7. Joins Are Not Free
ClickHouse® supports joins, but they are not its strength.
Common mistake
Normalizing data and joining at query time:
SELECT e.*, u.country
FROM events e
JOIN users u ON e.user_id = u.id;
Problem
Joins can:
- Increase memory usage
- Slow down queries significantly
Fix:
Denormalize where possible.
Store frequently used fields together, even if it means duplicating data.
8. You’re Not Looking at system Tables
ClickHouse® gives you internal visibility. Use it.
Useful tables:
system.query_log→ what queries are doingsystem.parts→ storage layoutsystem.merges→ background activity
If you’re debugging performance without these, you’re guessing.
Closing Thought
ClickHouse® performance is predictable once you understand one thing:
It is a system designed to scan less data, not magically process everything faster.
If it reads less, it’s fast.
If it reads more, it’s slow.
So when a query is slow, don’t start by rewriting the query.
Start by asking:
Why is ClickHouse® reading so much data?
That question usually leads you directly to the problem.
References
https://clickhouse.com/docs/sql-reference/statements/select/order-by
https://clickhouse.com/docs/knowledgebase/optimize_final_vs_final
https://clickhouse.com/blog/clickhouse-debugging-issues-with-system-tables
