ClickHouse® for Time-Series Data: A Quick Introduction

Introduction

Time-series data is everywhere. Whether you’re monitoring application performance, collecting IoT sensor readings, tracking financial transactions, or analyzing user activity, data is often generated continuously over time.

As data volumes grow, storing and querying time-series workloads efficiently becomes challenging. Traditional databases may struggle to keep up with high ingestion rates and complex analytical queries.

This is where ClickHouse® shines.

ClickHouse® offers a compelling alternative. It is not a dedicated time-series database, but its columnar storage, high compression, and blazing-fast aggregation capabilities make it one of the best platforms for handling time-series workloads at scale.

In this blog, we’ll explore what time-series data is, why ClickHouse® is well suited for it, how to design tables for time-series use cases, and how to write common time-series queries.

What Is Time-Series Data?

Time-series data consists of observations recorded over time, where each record contains a timestamp and one or more associated values. The timestamp is not just metadata – it is the primary dimension used to query, aggregate, and analyze the data.

Common examples:

Application logs – error rates, request counts, response times.
Infrastructure metrics – CPU usage, memory consumption, disk I/O.
IoT sensor readings – temperature, pressure, humidity over time.
Financial data – stock prices, exchange rates, transaction volumes.
User activity – page views, clicks, session durations.
Business metrics – daily revenue, order counts, conversion rates.

A simple example:

Timestamp	Temperature
2025-01-01 09:00	21.5
2025-01-02 10:00	22.1
2025-01-03 11:00	22.8

Key characteristics of time-series data:

Data is almost always append-only – new records are added, existing ones are rarely updated
Queries typically involve time ranges – “last 24 hours”, “this week”, “between two dates”
Aggregations over time are the most common operation – averages, sums, counts grouped by time intervals
Data volume is high and continuous – events arrive constantly, often in large batches

Why ClickHouse® for Time-Series Data?

Several properties of ClickHouse® make it particularly well suited for time-series workloads.

Column-Oriented Storage

Unlike traditional row-based databases, ClickHouse® stores data by column.

For example:

        server_id     metric_name             timestamp             Value
         
         web-01       cpu_usage         2025-01-01 10:00:00         72.5
         web-01       memory_usage      2025-01-01 10:00:00         61.2
         web-02       cpu_usage         2025-01-01 10:00:00         45.8
         web-02       memory_usage      2025-01-01 10:00:00         78.3
         db-01        cpu_usage         2025-01-01 10:00:00         30.1

For time-series queries that typically aggregate one or two columns across millions of rows, this means only the relevant columns are read from disk. As a result, I/O is dramatically reduced compared to row-based databases.

High Ingestion Rates

Time-series workloads often generate millions of records every hour. ClickHouse® can ingest data at extremely high speeds while maintaining efficient storage and query performance.

This makes it ideal for: Monitoring systems, Logging platforms, IoT platforms, Telemetry pipelines.

Fast Aggregations

Analytical queries typically involves: COUNT(), SUM(), AVG(), MIN(), MAX(), GROUP BY., etc..,ClickHouse® is heavily optimized for these operations.

ClickHouse® uses vectorized query execution, which means it processes data in large batches using SIMD instructions. This makes aggregations extremely fast, even over billions of rows.
Even on billions of rows, such queries can execute remarkably fast.

Efficient Compression

Time-series data often contains repetitive values and predictable patterns. ClickHouse® leverages columnar compression to significantly reduce storage requirements while maintaining high query performance.

Built-in Time Functions

ClickHouse® provides a rich set of time-series specific functions out of the box – toStartOfMinute(), toStartOfHour(), toStartOfDay(), dateDiff(), now(), today(), and many more. These make time-based aggregations simple and expressive.

Horizontal Scalability

As time-series data grows, ClickHouse® scales horizontally across clusters. Data can be distributed and replicated across nodes, making it suitable for petabyte-scale time-series workloads.

Time-Series Data Modeling in ClickHouse®

A typical time-series table contains:

Timestamp
Entity identifier
Metrics

Example:

CREATE TABLE metrics (
    server_id   LowCardinality(String),
    metric_name LowCardinality(String),
    timestamp   DateTime,
    value       Float64
) 
ENGINE = MergeTree()
ORDER BY (server_id, metric_name, timestamp);

Designing a Time-Series Table in ClickHouse®

The table design is critical for time-series performance in ClickHouse®. Here are the key decisions.

Use the MergeTree Engine

The MergeTree family of table engines is the foundation of ClickHouse®. For time-series data, it is the right choice in almost every case.

ORDER BY – The Most Important Decision

The ORDER BY clause in ClickHouse® determines how data is physically sorted on disk. For time-series tables, always include your most common filter columns first, followed by the timestamp.

-- Good: filter by server and metric, then sort by time
ORDER BY (server_id, timestamp)

-- Less optimal: sorting by time only
ORDER BY timestamp

This physically organizes data by server and time. Hence, ClickHouse® can skip entire ranges of data that don’t match, dramatically reducing the amount of data scanned.

For time-series workloads, the timestamp is almost always part of the sorting key.

Use DateTime64 for Sub-Second Precision

If your events have millisecond or microsecond precision, use DateTime64 instead of DateTime:

timestamp DateTime64(3) -- millisecond precision
timestamp DateTime64(6) -- microsecond precision

Use LowCardinality for Repeated String Columns

Columns like server_id, metric_name, region, or status typically have a small number of distinct values repeated many times. Wrapping them with LowCardinality applies dictionary encoding and significantly improves both compression and query speed.

Partition by Date

Partitioning by date allows ClickHouse® to skip entire partitions when queries filter by time range. This is one of the most effective optimizations for time-series tables.

CREATE TABLE metrics (
   server_id LowCardinality(String),
   metric_name LowCardinality(String),
   timestamp DateTime,
   value Float64
) 
ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (server_id, metric_name, timestamp);

With this setup, a query filtering for “last 30 days” only reads the relevant monthly partitions everything else is skipped entirely.

Inserting Time-Series Data

ClickHouse® performs best with batch inserts rather than one row at a time. For time-series workloads, always insert data in batches of at least 1,000 rows, and ideally 10,000 or more.

INSERT INTO metrics (server_id, metric_name, timestamp, value) VALUES
('web-01', 'cpu_usage', '2025-01-01 10:00:00', 72.5),
('web-01', 'memory_usage', '2025-01-01 10:00:00', 61.2),
('web-02', 'cpu_usage', '2025-01-01 10:00:00', 45.8),
('web-02', 'memory_usage', '2025-01-01 10:00:00', 78.3),
('db-01', 'cpu_usage', '2025-01-01 10:00:00', 30.1);

In production, time-series data is typically ingested through a pipeline using tools like Kafka, Vector, Fluent Bit, or Telegraf, all of which support ClickHouse® as a destination.

Common Time-Series Queries

1. Aggregate Metrics Over a Time Range

The most common time-series query – compute average CPU usage per hour for a specific server over the last 24 hours:

SELECT
   toStartOfHour(timestamp) AS hour,
   avg(value) AS avg_cpu
FROM metrics
WHERE server_id = 'web-01'
AND metric_name = 'cpu_usage'
AND timestamp >= now() - INTERVAL 24 HOUR
GROUP BY hour
ORDER BY hour ASC;

2. Down sample Data by Time Interval

Downsampling reduces data resolution for dashboards or long-term trend analysis:

SELECT
   toStartOfDay(timestamp) AS day,
   avg(value) AS avg_cpu,
   min(value) AS min_cpu,
   max(value) AS max_cpu
FROM metrics
WHERE metric_name = 'cpu_usage'
AND timestamp >= now() - INTERVAL 30 DAY
GROUP BY day
ORDER BY day ASC;

3. Find Peak Values

SELECT
   server_id,
   max(value) AS peak_cpu,
   argMax(timestamp, value) AS peak_time
FROM metrics
WHERE metric_name = 'cpu_usage'
AND timestamp >= now() - INTERVAL 7 DAY
GROUP BY server_id
ORDER BY peak_cpu DESC;

4. Count Events Per Minute

Useful for monitoring request rates, error counts, or event frequency:

SELECT
   toStartOfMinute(timestamp) AS minute,
   count() AS event_count
FROM metrics
WHERE server_id = 'web-01'
AND timestamp >= now() - INTERVAL 1 HOUR
GROUP BY minute
ORDER BY minute ASC;

5. Calculate the Rate of Change

Compare current values to the previous period to detect spikes or drops:

SELECT
   toStartOfHour(timestamp) AS hour,
   avg(value) AS avg_cpu,
   avg(value) - lagInFrame(avg(value)) OVER (ORDER BY toStartOfHour(timestamp)) AS change
FROM metrics
WHERE metric_name = 'cpu_usage'
AND server_id = 'web-01'
AND timestamp >= now() - INTERVAL 12 HOUR
GROUP BY hour
ORDER BY hour ASC;

6. Working with Time Buckets

A common requirement in time-series analytics is grouping data into intervals.

ClickHouse® provides several functions for this:

toStartOfMinute()
toStartOfHour()
toStartOfDay()
toStartOfWeek()
toStartOfMonth()

Example:

SELECT
    toStartOfDay(timestamp) AS day,
    count() AS events
FROM metrics
GROUP BY day
ORDER BY day;

This produces daily event counts.

7. Retention and Data Lifecycle

One of the most useful features for time-series workloads is TTL (Time To Live). It allows you to automatically delete or move old data based on the timestamp.

Time-series datasets grow rapidly. To automatically remove old data, ClickHouse® supports TTL (Time To Live).

Example:

CREATE TABLE metrics
(
    timestamp DateTime,
    server_id String,
    cpu_usage Float32
)
ENGINE = MergeTree
ORDER BY (server_id, timestamp)
TTL timestamp + INTERVAL 90 DAY;

In this example, data older than 90 days is automatically removed. This is particularly useful for log and metrics data where old records lose their value over time. As a result, storage costs are kept under control automatically without manual cleanup jobs.

Useful ClickHouse® Time Functions

ClickHouse® provides a rich set of built-in functions specifically designed for time-series analysis:

Function	Description
now()	Returns the current date and time
today()	Returns the current date
toStartOfMinute(t)	Rounds down to the beginning of the minute
toStartOfHour(t)	Rounds down to the beginning of the hour
toStartOfDay(t)	Rounds down to the beginning of the day
toStartOfWeek(t)	Start of week (Monday by default)
toStartOfMonth(t)	Truncates to the start of the month
toYYYYMM(t)	Returns year and month as integer
toYear(t)	Extracts the year from a timestamp
dateDiff(‘unit’, t1, t2)	Difference between two timestamps
toUnixTimestamp(t)	Converts datetime to Unix timestamp
fromUnixTimestamp(n)	Converts Unix timestamp to datetime
addHours(t, n)	Adds n hours to a timestamp
subtractDays(t, n)	Subtracts n days from a timestamp

ClickHouse® is not a dedicated time-series database, but for analytical workloads involving large volumes of time-stamped data, it frequently outperforms dedicated alternatives. Furthermore, its full SQL support means there is no proprietary query language to learn.

Best Practices for Time-Series in ClickHouse®

Always include the timestamp column in the ORDER BY clause, typically as the last column after your filter dimensions.
Partition by month or day using toYYYYMM() or toYYYYMMDD() to enable partition pruning on time range queries.
Use LowCardinality for any string column with fewer than 10,000 distinct values.
Use DateTime64(3) for millisecond precision instead of storing timestamps as integers.
Always insert in batches, avoid single-row inserts in production.
Use TTL (Time To Live) to automatically expire old data and manage storage costs.

Exploring ClickHouse® for Your Analytics?

At Quantrail Data, we help teams run ClickHouse® reliably for real-time analytics – from Kubernetes deployments and migrations to performance tuning in production.
We see these challenges firsthand while supporting demanding analytics workloads. In one recent engagement, a customer achieved near bare-metal performance with ClickHouse® in production – a story we’ve shared here:

Success Story: Quantrail Bare-Metal ClickHouse® Deployment

If you’re evaluating ClickHouse® or trying to get more out of an existing setup, we’re happy to share practical lessons from real-world deployments.

Contact
Quantrail Data

Final Thoughts

ClickHouse® is not marketed as a time-series database, but in practice it handles time-series workloads exceptionally well. Its columnar storage, high compression, fast aggregations, and rich time functions make it a natural fit for any system that needs to store and query large volumes of time-stamped data.

Whether you are building an infrastructure monitoring platform, a real-time analytics dashboard, or a financial data pipeline, ClickHouse® gives you the tools to handle time-series data at scale, without the limitations of dedicated time-series databases.

References

ClickHouse® Documentation – Date and Time Functions

Time series – Use case guide

Working with Time Series Data – ClickHouse

Post Views: 22