Data aggregation is one of the most common operations in analytical workloads. Whether you’re calculating daily revenue, counting active users, measuring API requests, or building dashboards, aggregation allows you to summarize large volumes of raw data into meaningful insights.
ClickHouse® is designed for high-performance analytical processing and provides a rich set of aggregation capabilities. Its columnar storage architecture, vectorized query execution, and parallel processing model make aggregate queries significantly faster than traditional row-based databases.
This article introduces the fundamentals of data aggregation in ClickHouse®, covering aggregate functions, GROUP BY operations, multi-dimensional aggregations, and performance considerations.
What is Data Aggregation?
Data aggregation is the process of combining multiple rows of data into a summarized result.
Consider a table storing e-commerce orders:
CREATE TABLE orders
(
order_id UInt64,
customer_id UInt64,
category String,
amount Float64,
created_at DateTime
)
ENGINE = MergeTree
ORDER BY created_at;
Sample data:
| order_id | customer_id | category | amount | created_at |
|---|---|---|---|---|
| 1 | 101 | Electronics | 500.31 | 2026-09-30 10:40:36 |
| 2 | 102 | Books | 50.89 | 2026-09-30 11:20:50 |
| 3 | 101 | Electronics | 300.76 | 2026-09-30 11:22:22 |
| 4 | 103 | Books | 100.01 | 2026-09-30 16:06:59 |
Instead of analyzing every individual order, aggregation allows us to answer questions such as:
- Total revenue
- Number of orders
- Average order value
- Revenue per category
- Orders per customer
Aggregate Functions in ClickHouse®
Aggregate functions compute a single result from multiple rows.
COUNT
Counts rows.
SELECT count()
FROM orders;
Result:
4
To count distinct customers:
SELECT count(DISTINCT customer_id)
FROM orders;
Result:
3
SUM
Calculates the total value.
SELECT sum(amount)
FROM orders;
Result:
951.97
AVG
Calculates the average value.
SELECT avg(amount)
FROM orders;
Result:
237.9925
MIN and MAX
Find the smallest and largest values.
SELECT
min(amount) AS minimum_order,
max(amount) AS maximum_order
FROM orders;
Result:
50.89, 500.31
uniq()
ClickHouse provides specialized functions for distinct counting.
SELECT uniq(customer_id)
FROM orders;
Result:
3
Unlike traditional COUNT(DISTINCT), ClickHouse® offers multiple distinct-counting algorithms optimized for different accuracy and performance requirements.
Examples include:
uniq()
uniqExact()
uniqCombined()
uniqHLL12()
Choosing the appropriate function depends on accuracy requirements and dataset size.
Grouping Data with GROUP BY
Aggregate functions become more useful when combined with GROUP BY.
Suppose we want revenue by category.
SELECT
category,
sum(amount) AS revenue
FROM orders
GROUP BY category;
Result:
| category | revenue |
|---|---|
| Electronics | 801.0699999999999 |
| Books | 150.9 |
The GROUP BY clause creates one aggregation bucket for each unique category.
Aggregating Across Multiple Dimensions
ClickHouse® supports grouping on multiple columns.
Example:
SELECT
category,
customer_id,
sum(amount) AS total_spent
FROM orders
GROUP BY
category,
customer_id;
Result:
| category | customer_id | total_spent |
|---|---|---|
| Books | 103 | 100.01 |
| Electronics | 101 | 801.0699999999999 |
| Books | 102 | 50.89 |
This allows analysis across multiple business dimensions simultaneously.
Time-Based Aggregation
Time-series aggregation is one of the most common ClickHouse® use cases.
Suppose we want daily revenue.
SELECT
toDate(created_at) AS day,
sum(amount) AS revenue
FROM orders
GROUP BY day
ORDER BY day;
Result:
┌────────day─┬─revenue─┐
1. │ 2026-09-30 │ 951.97 │
└────────────┴─────────┘
For hourly aggregation:
SELECT
toStartOfHour(created_at) AS hour,
count() AS orders
FROM orders
GROUP BY hour
ORDER BY hour;
Result:
┌────────────────hour─┬─orders─┐
1. │ 2026-09-30 10:00:00 │ 1 │
2. │ 2026-09-30 11:00:00 │ 2 │
3. │ 2026-09-30 16:00:00 │ 1 │
└─────────────────────┴────────┘
Common time bucketing functions include:
toStartOfMinute()
toStartOfHour()
toStartOfDay()
toStartOfWeek()
toStartOfMonth()
These functions are heavily optimized for analytical workloads.
Conditional Aggregation
ClickHouse® supports conditional aggregation using aggregate combinators.
Example:
SELECT
count() AS total_orders,
countIf(amount > 100) AS large_orders
FROM orders;
Result:
total_orders = 4
large_orders = 3
Similarly:
SELECT
sumIf(amount, category = 'Electronics') AS electronics_revenue
FROM orders;
Result:
801.0699999999999
This avoids multiple table scans and improves query efficiency.
Multi-Level Aggregation with ROLLUP
ROLLUP generates subtotals and grand totals automatically.
SELECT
category,
sum(amount) AS revenue
FROM orders
GROUP BY ROLLUP(category);
Example output:
| category | revenue |
|---|---|
| Books | 150.9 |
| Electronics | 801.0699999999999 |
| NULL | 951.9699999999999 |
The final row represents the grand total.
ROLLUP is useful for reporting systems and dashboard summaries.
CUBE Aggregation
CUBE generates aggregations for every possible combination of dimensions.
Example:
SELECT
category,
customer_id,
sum(amount)
FROM orders
GROUP BY CUBE(category, customer_id);
Result:
┌─category────┬─customer_id─┬───────sum(amount)─┐
1. │ Books │ 103 │ 100.01 │
2. │ Electronics │ 101 │ 801.0699999999999 │
3. │ Books │ 102 │ 50.89 │
4. │ Books │ 0 │ 150.9 │
5. │ Electronics │ 0 │ 801.0699999999999 │
6. │ │ 101 │ 801.0699999999999 │
7. │ │ 103 │ 100.01 │
8. │ │ 102 │ 50.89 │
9. │ │ 0 │ 951.9699999999999 │
└─────────────┴─────────────┴───────────────────┘
This produces:
- Revenue by category
- Revenue by customer
- Revenue by category and customer
- Overall revenue
CUBE is particularly useful for OLAP-style analysis and multidimensional reporting.
Aggregation States
One of ClickHouse®’s unique capabilities is storing intermediate aggregation states.
Example:
SELECT
sumState(amount)
FROM orders;
This returns an aggregation state instead of a final value.
The state can later be merged using sumMerge() function.
Aggregation states form the foundation of:
- Materialized Views
- AggregatingMergeTree
- Incremental aggregations
- Real-time analytics pipelines
For large-scale workloads, storing aggregation states can dramatically reduce query latency.
Handling NULL Values
During aggregation, ClickHouse® generally skips NULL values.
Example:
SELECT avg(nullable_column)
FROM table_name;
NULL entries are ignored when computing the result.
Understanding NULL behavior is important when building analytical pipelines because aggregate results may differ from traditional SQL implementations.
Why Aggregation is Fast in ClickHouse®
ClickHouse® achieves high aggregation performance through several architectural optimizations:
Columnar Storage
Only the columns referenced in a query are read from disk.
SELECT category, sum(amount)
FROM orders
GROUP BY category;
The database only reads:
- category
- amount
instead of the entire row.
Parallel Execution
ClickHouse® executes GROUP BY operations across multiple CPU cores simultaneously.
Each processing thread builds partial aggregation states independently before merging them into a final result.
Specialized Hash Tables
The query engine automatically selects optimized aggregation algorithms and hash table implementations depending on:
- Data types
- Cardinality
- Query complexity
- Memory requirements
This significantly improves aggregation throughput on large datasets.
Best Practices
Aggregate as Early as Possible
Instead of returning raw events:
SELECT *
FROM events;
Prefer:
SELECT
event_type,
count()
FROM events
GROUP BY event_type;
This reduces data movement and memory usage.
Use Approximate Functions When Appropriate
For very large datasets:
uniq()
is often much faster than:
uniqExact()
while providing acceptable accuracy.
Consider Materialized Views
Frequently queried aggregations can be precomputed using materialized views.
This reduces query latency and lowers compute costs.
Exploring ClickHouse® for Your Analytics?
At Quantrail Data, we help teams run ClickHouse® reliably for real-time analytics – from Kubernetes deployments and migrations to performance tuning in production.
We see these challenges firsthand while supporting demanding analytics workloads. In one recent engagement, a customer achieved near bare-metal performance with ClickHouse® in production – a story we’ve shared here:
Success Story: Quantrail Bare-Metal ClickHouse® Deployment
If you’re evaluating ClickHouse® or trying to get more out of an existing setup, we’re happy to share practical lessons from real-world deployments.
Contact
Quantrail Data
Conclusion
Aggregation is the foundation of analytical querying, and ClickHouse® provides an extensive set of tools to perform it efficiently. From basic functions such as count(), sum(), and avg() to advanced capabilities like ROLLUP, CUBE, aggregation states, and materialized views, ClickHouse® is built to process aggregations at scale.
Understanding these building blocks is essential before moving into more advanced topics such as aggregate combinators, materialized view pipelines, AggregatingMergeTree tables, and real-time pre-aggregation strategies.
For most analytical workloads, mastering aggregation is the first step toward unlocking the full performance potential of ClickHouse®.
References
Official ClickHouse® Documentation – https://clickhouse.com/docs
Suggested Articles
What is ClickHouse®? A Beginner’s Guide to the OLAP Database
Getting Started with ClickHouse®: OSS – Installation and Setup
ClickHouse vs PostgreSQL: When to Use Which?
Understanding Column-Oriented Databases: The ClickHouse Advantage
Your First ClickHouse® Query: A Step-by-Step Tutorial
How to Load CSV Data into ClickHouse®
ClickHouse® Data Types: Explained for Beginners
Introduction to ClickHouse® MergeTree Engines
