ClickHouse continues its blistering monthly cadence, and the 25.10 release is another strong showing – packed with join optimizations, flexibility in vector search, and powerful SQL enhancements. Whether you’re building real-time analytics, embedding-based systems, or data lake – powered pipelines, there are meaningful improvements here.
Here’s a breakdown of what’s new – and why it matters.
Key Features & Why They Matter
1. Smarter, Leaner Joins
- Lazy Columns Replication: ClickHouse now avoids blindly copying large column values during joins. Instead, it keeps a compact index pointing back to original data, replicating values only when needed. This dramatically reduces CPU and memory usage.
- Bloom Filter Pre-Filtering: For parallel hash joins, ClickHouse builds a runtime Bloom filter on one side’s join key and applies it as a
PREWHEREon the other side. The result: up to 2.1× faster queries and ~7× lower memory consumption in benchmarks. - Push-Down of Complex OR Conditions: If your join conditions have OR branches (e.g.,
(a AND b) OR (c AND d)), ClickHouse can now push filters to both tables even when each branch has predicates for both sides – reducing the scan amount significantly. - Automatic Column Statistics: A new table-level setting
auto_statistics_types(e.g.minmax, uniq, countmin) automatically generates stats for MergeTree tables, helping the planner pick optimal join orders.
Why this matters: Join-heavy workloads (especially in analytics) often suffer from memory bloat and slow performance. These optimizations make ClickHouse more efficient, cost-effective, and scalable for complex analytical pipelines.
2. QBit: Precision-Tunable Vector Search
- New QBit Data Type: This lets you store embedding vectors in a bit-sliced format. You decide, at query time, how many of the most significant bits to use – offering a trade-off between speed and precision.
- Query Example:
CREATE TABLE vectors ( id UInt64, name String, vec QBit(BFloat16, 1536)) ORDER BY (); SELECT id, name FROM vectors ORDER BY L2DistanceTransposed(vec, target, 10) LIMIT 10;
Why this matters: For embedding-based applications (recommendation, semantic search, ML), QBit gives you more control. You can optimize for precision when needed, or prioritize speed and memory efficiency.
3. Late Materialization of Secondary Indices
- ClickHouse now supports delaying index building (like vector similarity indices) until background merges, instead of building them eagerly on insert.
- You can control this via settings like
exclude_materialize_skip_indexes_on_insertor disable building on merge withmaterialize_skip_indexes_on_merge.
Why this matters: Index building can be expensive in terms of time and storage. Delaying it gives more flexibility, especially in high-ingest systems or when working with large embeddings.
4. SQL Enhancements: More Expressive & Flexible
<=>Operator (IS NOT DISTINCT FROM): This treatsNULLas equal toNULL, giving you more precise semantics in comparisons.- Negative
LIMIT/OFFSET: You can now doLIMIT -Nto fetch the last N rows but return them in ascending order. LIMIT BY ALL: Allows limiting duplicate rows – useful when you don’t want distinct-by semantics but want to cap repetition.- Base Conversion Function: New
conv()function to convert numbers between bases (like MySQL’s). - Table Aliases:
Aliasengine support lets you create lightweight aliases for tables.
Why this matters: These syntax enhancements make ClickHouse SQL more expressive, letting you write cleaner, more intuitive queries – especially for complex data transformations or analytics.
5. Arrow Flight Enhancements
- Full server + client compatibility: ClickHouse 25.10 allows you to run an Arrow Flight server and query it using a Flight client.
- Example config:
arrowflight_port: 6379 arrowflight: enable_ssl: false auth_required: falseThen you can query:SELECT max(price), count() FROM arrowflight('localhost:6379', 'uk_price_paid', 'default', '');
Why this matters: Arrow Flight enables efficient, cross-language data exchange. This is huge for integrating ClickHouse into data ecosystems that leverage Arrow (e.g., Python, Java, Rust analytics tools).
Risks, Caveats & Upgrade Notes
- JDBC Compatibility: There is a reported issue with the JDBC driver breaking for version 25.10. If your applications use JDBC clients, test carefully before upgrading.
- Backward-Incompatible Change: The default
schema_inference_make_columns_nullablesetting has changed – it now respectsNullable-ness from Parquet/Arrow metadata. - Feature Maturity: While many features are stable, some (like QBit) are still relatively new; production users should test in staging for performance and correctness.
- Tuning Required: Settings like
enable_lazy_columns_replication,enable_join_runtime_filters, andauto_statistics_typesmay need tuning depending on your workload.
Why This Release Is a Big Deal
- For Data Engineers: Massive join efficiency gains, less CPU/memory waste, and smarter planning via automatic statistics.
- For ML/AI Teams: QBit makes ClickHouse a more powerful candidate for embedding-based workloads, with precision tuning.
- For Analytics/BI: Better SQL flexibility and Arrow Flight support makes it easier to build interoperable, high-performance pipelines.
- For Ops: Delayed index building lowers the operational burden of large secondary indices; new SQL features make day-to-day querying more robust.
Final Thoughts
ClickHouse 25.10 cements its position not just as a blazing-fast OLAP database, but as a sophisticated platform for analytic joins and vector workloads. The join optimizations alone can significantly reduce cost and latency, while QBit unlocks precision-tunable vector search.
If you’re upgrading, test join-heavy workloads and embedding workloads carefully. But once you’re on 25.10, you should see real, tangible improvements – especially if you’re pushing ClickHouse into ML-driven or lakehouse architectures.
Looking for Expert ClickHouse Solutions?
At Quantrail Data, we offer:
- Fully managed ClickHouse services
- Seamless migration support
- Performance optimization and consulting
Whether you’re deploying ClickHouse at scale, integrating geospatial or lakehouse pipelines, or just want expert backup – we’re here to help.
Let’s unlock better analytics together.
