All posts
Mastering ClickHouse® AggregatingMergeTree

Mastering ClickHouse® AggregatingMergeTree

July 2, 20267 min readGayathri
Share:

Introduction

As organizations collect more analytical data, generating reports from raw datasets becomes increasingly expensive. Dashboards that calculate daily sales, website traffic, user activity, or business metrics often execute the same aggregation queries repeatedly. Although ClickHouse® is designed for high-speed analytics, repeatedly scanning millions or billions of rows still consumes CPU, memory, and disk resources.

This is where AggregatingMergeTree becomes valuable.

Instead of recalculating aggregates every time a query runs, AggregatingMergeTree stores intermediate aggregate states that are merged incrementally in the background. As a result, reports and dashboards can retrieve summarized data much faster while reducing the computational overhead on the database.

In this article, we'll explore how AggregatingMergeTree works, understand aggregate states, build a practical implementation using Materialized Views, compare it with MergeTree, and review best practices for designing scalable analytical applications.

1. What is AggregatingMergeTree?

AggregatingMergeTree is a specialized table engine in ClickHouse designed for storing pre-aggregated data.

Unlike a standard MergeTree table, which stores every inserted row, AggregatingMergeTree stores aggregate function states. During background merge operations, ClickHouse combines these states automatically, allowing analytical queries to read much smaller datasets.

This engine is particularly useful when the same aggregation queries are executed repeatedly, such as those powering dashboards, reports, and business intelligence applications.

Typical use cases include:

  • Daily sales reports
  • Website analytics
  • IoT sensor summaries
  • Financial reporting
  • Business KPIs
  • Monitoring dashboards

2. Why Do We Need AggregatingMergeTree?

Imagine an e-commerce platform that records millions of orders every day.

A reporting dashboard needs to display total sales for each product.

A typical query might look like:

SELECT
    product,
    sum(amount) AS total_sales
FROM sales
GROUP BY product;

This query works well on small datasets. However, when the sales table grows to billions of rows, ClickHouse must scan and aggregate all matching records every time the dashboard refreshes.

Even though ClickHouse is highly optimized, repeatedly processing massive datasets is inefficient.

AggregatingMergeTree solves this challenge by storing aggregated information as data is inserted. Instead of recalculating totals from scratch, queries simply read the pre-aggregated results, dramatically reducing execution time.

3. How Does AggregatingMergeTree Work?

The workflow typically follows these steps:

  1. Raw records are inserted into a MergeTree table.
  2. A Materialized View computes aggregate states.
  3. AggregatingMergeTree stores those aggregate states.
  4. ClickHouse merges compatible states automatically in the background.
  5. Queries finalize the results using merge functions.
Application


Raw MergeTree Table


Materialized View


Aggregate States
(sumState(), avgState())


AggregatingMergeTree


Background Merge


sumMerge()


Dashboard / Report

The advantage is that the expensive aggregation work happens once during data ingestion rather than every time a report is generated.

4. Aggregate Functions vs. AggregatingMergeTree

These two concepts are related but serve different purposes.

Aggregate FunctionsAggregatingMergeTree
SQL functionsTable engine
Calculate results during query executionStores aggregate states
Used in SELECT queriesUsed for pre-aggregated summary tables
Examples: sum(), count(), avg()Uses AggregateFunction columns

Think of it this way:

  • Aggregate functions perform calculations.
  • AggregatingMergeTree stores the results of those calculations in an intermediate form.

5. Understanding Aggregate States

This is the most important concept behind AggregatingMergeTree.

Normally, when you execute:

SELECT sum(amount)
FROM sales;

ClickHouse returns the final total.

However, AggregatingMergeTree does not store the final value.

Instead, it stores an aggregate state.

For example,

sumState(amount)

creates a partial aggregation.

Later,

sumMerge(total_sales)

combines all stored states to produce the final result.

You can think of an aggregate state as a partially completed calculation that ClickHouse can continue merging efficiently over time.

6. Creating the Tables

Let's build a simple website analytics example that tracks daily page views.

Step 1: Create the Raw Events Table

This table stores every page visit.

CREATE TABLE page_views
(
    event_date Date,
    page String,
    views UInt64
)
ENGINE = MergeTree
ORDER BY (event_date, page);

Example raw data:

DatePageViews
2026-07-01Home1
2026-07-01Home1
2026-07-01Products1
2026-07-01Home1
2026-07-01Contact1

Every page visit is stored as an individual row.

Step 2: Create the Summary Table

Instead of storing final totals, AggregatingMergeTree stores aggregate function states.

CREATE TABLE page_views_summary
(
    event_date Date,
    page String,
    total_views AggregateFunction(sum, UInt64)
)
ENGINE = AggregatingMergeTree
ORDER BY (event_date, page);

Notice that the column type is:

AggregateFunction(sum, UInt64)

instead of:

UInt64

This is because the table stores aggregate states, not completed values.

7. Automatically Aggregating Data

Rather than calculating totals manually, create a Materialized View that automatically updates the summary table whenever new events are inserted.

CREATE MATERIALIZED VIEW mv_page_views
TO page_views_summary
AS
SELECT
    event_date,
    page,
    sumState(views) AS total_views
FROM page_views
GROUP BY
    event_date,
    page;

Whenever a new page view is inserted into the page_views table, the materialized view computes the aggregate state and stores it in the summary table automatically.

This eliminates the need to run aggregation queries repeatedly on the raw data.

8. Querying the Summary Table

Since the summary table stores aggregate states, use sumMerge() to retrieve the final aggregated values.

SELECT
    event_date,
    page,
    sumMerge(total_views) AS total_views
FROM page_views_summary
GROUP BY
    event_date,
    page
ORDER BY
    event_date,
    page;

Example Output

DatePageTotal Views
2026-07-01Home35,842
2026-07-01Products12,614
2026-07-01Contact4,238

Instead of scanning millions of individual page-view events, ClickHouse reads a much smaller summary table containing pre-aggregated data. This significantly reduces query execution time and improves dashboard performance, especially for high-traffic analytics workloads.

10. When Should You Use AggregatingMergeTree?

AggregatingMergeTree is an excellent choice for analytical workloads where the same aggregation queries are executed repeatedly. By storing pre-aggregated data, it significantly reduces the amount of data scanned during query execution, resulting in faster response times and lower resource consumption.

Common use cases include:

  • Interactive dashboards with frequent refreshes
  • Daily, hourly, or monthly reporting
  • Business KPIs and executive dashboards
  • Website and application analytics
  • Monitoring and observability platforms
  • Large-scale event or log analytics

However, AggregatingMergeTree is not recommended for:

  • Transactional (OLTP) workloads
  • Applications requiring row-level updates or lookups
  • Frequently modified individual records

11. Common Mistakes to Avoid

When working with AggregatingMergeTree, avoid these common pitfalls:

  • Using sum() instead of sumState() when inserting aggregate data.
  • Querying aggregate states without using the corresponding merge functions such as sumMerge().
  • Defining aggregate columns as regular numeric types instead of AggregateFunction data types.
  • Using AggregatingMergeTree for transactional or row-level processing.
  • Expecting aggregate states to merge immediately after every insert, as background merges occur asynchronously.

Understanding these common mistakes can help you design more efficient aggregation pipelines and avoid unexpected query results.

Conclusion

AggregatingMergeTree is one of the most powerful table engines in ClickHouse® for optimizing analytical workloads. By storing aggregate function states instead of recalculating values from raw data, it dramatically improves query performance while reducing the computational cost of repeated aggregations.

When combined with Materialized Views and aggregate state functions such as sumState() and sumMerge(), it enables scalable, high-performance reporting for dashboards, business intelligence platforms, and real-time analytics applications.

If your ClickHouse deployment frequently executes the same aggregation queries over large datasets, adopting AggregatingMergeTree can significantly improve query performance, reduce resource utilization, and simplify the design of production-ready analytical systems.

References

Work with Quantrail

Expert ClickHouse services

We design, migrate, tune, and run ClickHouse for teams that own their data, from first architecture through day-two operations. Tell us what you are building and we will help.

Talk to an expert

Manage ClickHouse with CHOps

CHOps is our free, open-source ClickHouse admin tool: monitoring, query profiling, backups, visual access control, and alerting in one self-hosted interface, with zero agents on your servers.

Explore CHOps
Share: