ClickHouse 25.10: Faster JOINs, QBit Vectors & Smarter SQL

ClickHouse continues its blistering monthly cadence, and the 25.10 release is another strong showing – packed with join optimizations, flexibility in vector search, and powerful SQL enhancements. Whether you’re building real-time analytics, embedding-based systems, or data lake – powered pipelines, there are meaningful improvements here.

Here’s a breakdown of what’s new – and why it matters.

Key Features & Why They Matter

1. Smarter, Leaner Joins

Lazy Columns Replication: ClickHouse now avoids blindly copying large column values during joins. Instead, it keeps a compact index pointing back to original data, replicating values only when needed. This dramatically reduces CPU and memory usage.
Bloom Filter Pre-Filtering: For parallel hash joins, ClickHouse builds a runtime Bloom filter on one side’s join key and applies it as a PREWHERE on the other side. The result: up to 2.1× faster queries and ~7× lower memory consumption in benchmarks.
Push-Down of Complex OR Conditions: If your join conditions have OR branches (e.g., (a AND b) OR (c AND d)), ClickHouse can now push filters to both tables even when each branch has predicates for both sides – reducing the scan amount significantly.
Automatic Column Statistics: A new table-level setting auto_statistics_types (e.g. minmax, uniq, countmin) automatically generates stats for MergeTree tables, helping the planner pick optimal join orders.

Why this matters: Join-heavy workloads (especially in analytics) often suffer from memory bloat and slow performance. These optimizations make ClickHouse more efficient, cost-effective, and scalable for complex analytical pipelines.

2. QBit: Precision-Tunable Vector Search

New QBit Data Type: This lets you store embedding vectors in a bit-sliced format. You decide, at query time, how many of the most significant bits to use – offering a trade-off between speed and precision.
Query Example: CREATE TABLE vectors ( id UInt64, name String, vec QBit(BFloat16, 1536)) ORDER BY (); SELECT id, name FROM vectors ORDER BY L2DistanceTransposed(vec, target, 10) LIMIT 10;

Why this matters: For embedding-based applications (recommendation, semantic search, ML), QBit gives you more control. You can optimize for precision when needed, or prioritize speed and memory efficiency.

3. Late Materialization of Secondary Indices

ClickHouse now supports delaying index building (like vector similarity indices) until background merges, instead of building them eagerly on insert.
You can control this via settings like exclude_materialize_skip_indexes_on_insert or disable building on merge with materialize_skip_indexes_on_merge.

Why this matters: Index building can be expensive in terms of time and storage. Delaying it gives more flexibility, especially in high-ingest systems or when working with large embeddings.

4. SQL Enhancements: More Expressive & Flexible

<=> Operator (IS NOT DISTINCT FROM): This treats NULL as equal to NULL, giving you more precise semantics in comparisons.
Negative LIMIT / OFFSET: You can now do LIMIT -N to fetch the last N rows but return them in ascending order.
LIMIT BY ALL: Allows limiting duplicate rows – useful when you don’t want distinct-by semantics but want to cap repetition.
Base Conversion Function: New conv() function to convert numbers between bases (like MySQL’s).
Table Aliases: Alias engine support lets you create lightweight aliases for tables.

Why this matters: These syntax enhancements make ClickHouse SQL more expressive, letting you write cleaner, more intuitive queries – especially for complex data transformations or analytics.

5. Arrow Flight Enhancements

Full server + client compatibility: ClickHouse 25.10 allows you to run an Arrow Flight server and query it using a Flight client.
Example config: arrowflight_port: 6379 arrowflight: enable_ssl: false auth_required: false Then you can query: SELECT max(price), count() FROM arrowflight('localhost:6379', 'uk_price_paid', 'default', '');

Why this matters: Arrow Flight enables efficient, cross-language data exchange. This is huge for integrating ClickHouse into data ecosystems that leverage Arrow (e.g., Python, Java, Rust analytics tools).

Risks, Caveats & Upgrade Notes

JDBC Compatibility: There is a reported issue with the JDBC driver breaking for version 25.10. If your applications use JDBC clients, test carefully before upgrading.
Backward-Incompatible Change: The default schema_inference_make_columns_nullable setting has changed – it now respects Nullable-ness from Parquet/Arrow metadata.
Feature Maturity: While many features are stable, some (like QBit) are still relatively new; production users should test in staging for performance and correctness.
Tuning Required: Settings like enable_lazy_columns_replication, enable_join_runtime_filters, and auto_statistics_types may need tuning depending on your workload.

Why This Release Is a Big Deal

For Data Engineers: Massive join efficiency gains, less CPU/memory waste, and smarter planning via automatic statistics.
For ML/AI Teams: QBit makes ClickHouse a more powerful candidate for embedding-based workloads, with precision tuning.
For Analytics/BI: Better SQL flexibility and Arrow Flight support makes it easier to build interoperable, high-performance pipelines.
For Ops: Delayed index building lowers the operational burden of large secondary indices; new SQL features make day-to-day querying more robust.

Final Thoughts

ClickHouse 25.10 cements its position not just as a blazing-fast OLAP database, but as a sophisticated platform for analytic joins and vector workloads. The join optimizations alone can significantly reduce cost and latency, while QBit unlocks precision-tunable vector search.

If you’re upgrading, test join-heavy workloads and embedding workloads carefully. But once you’re on 25.10, you should see real, tangible improvements – especially if you’re pushing ClickHouse into ML-driven or lakehouse architectures.

Looking for Expert ClickHouse Solutions?

At Quantrail Data, we offer:

Fully managed ClickHouse services
Seamless migration support
Performance optimization and consulting

Whether you’re deploying ClickHouse at scale, integrating geospatial or lakehouse pipelines, or just want expert backup – we’re here to help.

Explore our services

Let’s unlock better analytics together.

Contact

References

ClickHouse Release 25.10

What’s New in ClickHouse Changelog

Release 25.10 Community Call deck

Post Views: 1,181

ClickHouse 25.10: High-Performance JOINs, QBit Vectors & Smarter SQL