Basic Data Aggregation: GROUP BY, SUM, COUNT & More

Data aggregation is one of the most common operations in analytical workloads. Whether you’re calculating daily revenue, counting active users, measuring API requests, or building dashboards, aggregation allows you to summarize large volumes of raw data into meaningful insights.

ClickHouse® is designed for high-performance analytical processing and provides a rich set of aggregation capabilities. Its columnar storage architecture, vectorized query execution, and parallel processing model make aggregate queries significantly faster than traditional row-based databases.

This article introduces the fundamentals of data aggregation in ClickHouse®, covering aggregate functions, GROUP BY operations, multi-dimensional aggregations, and performance considerations.

What is Data Aggregation?

Data aggregation is the process of combining multiple rows of data into a summarized result.

Consider a table storing e-commerce orders:

CREATE TABLE orders
(
    order_id UInt64,
    customer_id UInt64,
    category String,
    amount Float64,
    created_at DateTime
)
ENGINE = MergeTree
ORDER BY created_at;

Sample data:

order_id	customer_id	category	amount	created_at
1	101	Electronics	500.31	2026-09-30 10:40:36
2	102	Books	50.89	2026-09-30 11:20:50
3	101	Electronics	300.76	2026-09-30 11:22:22
4	103	Books	100.01	2026-09-30 16:06:59

Instead of analyzing every individual order, aggregation allows us to answer questions such as:

Total revenue
Number of orders
Average order value
Revenue per category
Orders per customer

Aggregate Functions in ClickHouse®

Aggregate functions compute a single result from multiple rows.

COUNT

Counts rows.

SELECT count()
FROM orders;

Result:

To count distinct customers:

SELECT count(DISTINCT customer_id)
FROM orders;

Result:

SUM

Calculates the total value.

SELECT sum(amount)
FROM orders;

Result:

951.97

AVG

Calculates the average value.

SELECT avg(amount)
FROM orders;

Result:

237.9925

MIN and MAX

Find the smallest and largest values.

SELECT
    min(amount) AS minimum_order,
    max(amount) AS maximum_order
FROM orders;

Result:

50.89,  500.31

uniq()

ClickHouse provides specialized functions for distinct counting.

SELECT uniq(customer_id)
FROM orders;

Result:

Unlike traditional COUNT(DISTINCT), ClickHouse® offers multiple distinct-counting algorithms optimized for different accuracy and performance requirements.

Examples include:

uniq()
uniqExact()
uniqCombined()
uniqHLL12()

Choosing the appropriate function depends on accuracy requirements and dataset size.

Grouping Data with GROUP BY

Aggregate functions become more useful when combined with GROUP BY.

Suppose we want revenue by category.

SELECT
    category,
    sum(amount) AS revenue
FROM orders
GROUP BY category;

Result:

category	revenue
Electronics	801.0699999999999
Books	150.9

The GROUP BY clause creates one aggregation bucket for each unique category.

Aggregating Across Multiple Dimensions

ClickHouse® supports grouping on multiple columns.

Example:

SELECT
    category,
    customer_id,
    sum(amount) AS total_spent
FROM orders
GROUP BY
    category,
    customer_id;

Result:

category	customer_id	total_spent
Books	103	100.01
Electronics	101	801.0699999999999
Books	102	50.89

This allows analysis across multiple business dimensions simultaneously.

Time-Based Aggregation

Time-series aggregation is one of the most common ClickHouse® use cases.

Suppose we want daily revenue.

SELECT
    toDate(created_at) AS day,
    sum(amount) AS revenue
FROM orders
GROUP BY day
ORDER BY day;

Result:

┌────────day─┬─revenue─┐
1. │ 2026-09-30 │  951.97 │
   └────────────┴─────────┘

For hourly aggregation:

SELECT
    toStartOfHour(created_at) AS hour,
    count() AS orders
FROM orders
GROUP BY hour
ORDER BY hour;

Result:

   ┌────────────────hour─┬─orders─┐
1. │ 2026-09-30 10:00:00 │      1 │
2. │ 2026-09-30 11:00:00 │      2 │
3. │ 2026-09-30 16:00:00 │      1 │
   └─────────────────────┴────────┘

Common time bucketing functions include:

toStartOfMinute()
toStartOfHour()
toStartOfDay()
toStartOfWeek()
toStartOfMonth()

These functions are heavily optimized for analytical workloads.

Conditional Aggregation

ClickHouse® supports conditional aggregation using aggregate combinators.

Example:

SELECT
    count() AS total_orders,
    countIf(amount > 100) AS large_orders
FROM orders;

Result:

total_orders = 4
large_orders = 3

Similarly:

SELECT
    sumIf(amount, category = 'Electronics') AS electronics_revenue
FROM orders;

Result:

801.0699999999999

This avoids multiple table scans and improves query efficiency.

Multi-Level Aggregation with ROLLUP

ROLLUP generates subtotals and grand totals automatically.

SELECT
    category,
    sum(amount) AS revenue
FROM orders
GROUP BY ROLLUP(category);

Example output:

category	revenue
Books	150.9
Electronics	801.0699999999999
NULL	951.9699999999999

The final row represents the grand total.

ROLLUP is useful for reporting systems and dashboard summaries.

CUBE Aggregation

CUBE generates aggregations for every possible combination of dimensions.

Example:

SELECT
    category,
    customer_id,
    sum(amount)
FROM orders
GROUP BY CUBE(category, customer_id);

Result:

   ┌─category────┬─customer_id─┬───────sum(amount)─┐
1. │ Books       │         103 │            100.01 │
2. │ Electronics │         101 │ 801.0699999999999 │
3. │ Books       │         102 │             50.89 │
4. │ Books       │           0 │             150.9 │
5. │ Electronics │           0 │ 801.0699999999999 │
6. │             │         101 │ 801.0699999999999 │
7. │             │         103 │            100.01 │
8. │             │         102 │             50.89 │
9. │             │           0 │ 951.9699999999999 │
   └─────────────┴─────────────┴───────────────────┘

This produces:

Revenue by category
Revenue by customer
Revenue by category and customer
Overall revenue

CUBE is particularly useful for OLAP-style analysis and multidimensional reporting.

Aggregation States

One of ClickHouse®’s unique capabilities is storing intermediate aggregation states.

Example:

SELECT
    sumState(amount)
FROM orders;

This returns an aggregation state instead of a final value.

The state can later be merged using sumMerge() function.

Aggregation states form the foundation of:

Materialized Views
AggregatingMergeTree
Incremental aggregations
Real-time analytics pipelines

For large-scale workloads, storing aggregation states can dramatically reduce query latency.

Handling NULL Values

During aggregation, ClickHouse® generally skips NULL values.

Example:

SELECT avg(nullable_column)
FROM table_name;

NULL entries are ignored when computing the result.

Understanding NULL behavior is important when building analytical pipelines because aggregate results may differ from traditional SQL implementations.

Why Aggregation is Fast in ClickHouse®

ClickHouse® achieves high aggregation performance through several architectural optimizations:

Columnar Storage

Only the columns referenced in a query are read from disk.

SELECT category, sum(amount)
FROM orders
GROUP BY category;

The database only reads:

category
amount

instead of the entire row.

Parallel Execution

ClickHouse® executes GROUP BY operations across multiple CPU cores simultaneously.

Each processing thread builds partial aggregation states independently before merging them into a final result.

Specialized Hash Tables

The query engine automatically selects optimized aggregation algorithms and hash table implementations depending on:

Data types
Cardinality
Query complexity
Memory requirements

This significantly improves aggregation throughput on large datasets.

Best Practices

Aggregate as Early as Possible

Instead of returning raw events:

SELECT *
FROM events;

Prefer:

SELECT
    event_type,
    count()
FROM events
GROUP BY event_type;

This reduces data movement and memory usage.

Use Approximate Functions When Appropriate

For very large datasets:

uniq()

is often much faster than:

uniqExact()

while providing acceptable accuracy.

Consider Materialized Views

Frequently queried aggregations can be precomputed using materialized views.

This reduces query latency and lowers compute costs.

Exploring ClickHouse® for Your Analytics?

At Quantrail Data, we help teams run ClickHouse® reliably for real-time analytics – from Kubernetes deployments and migrations to performance tuning in production.

We see these challenges firsthand while supporting demanding analytics workloads. In one recent engagement, a customer achieved near bare-metal performance with ClickHouse® in production – a story we’ve shared here:
Success Story: Quantrail Bare-Metal ClickHouse® Deployment

If you’re evaluating ClickHouse® or trying to get more out of an existing setup, we’re happy to share practical lessons from real-world deployments.

Contact
Quantrail Data

Conclusion

Aggregation is the foundation of analytical querying, and ClickHouse® provides an extensive set of tools to perform it efficiently. From basic functions such as count(), sum(), and avg() to advanced capabilities like ROLLUP, CUBE, aggregation states, and materialized views, ClickHouse® is built to process aggregations at scale.

Understanding these building blocks is essential before moving into more advanced topics such as aggregate combinators, materialized view pipelines, AggregatingMergeTree tables, and real-time pre-aggregation strategies.

For most analytical workloads, mastering aggregation is the first step toward unlocking the full performance potential of ClickHouse®.

References

Official ClickHouse® Documentation – https://clickhouse.com/docs

Quantrail Data

Basic Data Aggregation in ClickHouse®

What is Data Aggregation?

Aggregate Functions in ClickHouse®

COUNT

SUM

AVG

MIN and MAX

uniq()

Grouping Data with GROUP BY

Aggregating Across Multiple Dimensions

Time-Based Aggregation

Conditional Aggregation

Multi-Level Aggregation with ROLLUP

CUBE Aggregation

Aggregation States

Handling NULL Values

Why Aggregation is Fast in ClickHouse®

Columnar Storage

Parallel Execution

Specialized Hash Tables

Best Practices

Aggregate as Early as Possible

Use Approximate Functions When Appropriate

Consider Materialized Views

Exploring ClickHouse® for Your Analytics?

Conclusion

References

Suggested Articles

Quantrail Data