Introduction
Time-series data is everywhere. Whether you’re monitoring application performance, collecting IoT sensor readings, tracking financial transactions, or analyzing user activity, data is often generated continuously over time.
As data volumes grow, storing and querying time-series workloads efficiently becomes challenging. Traditional databases may struggle to keep up with high ingestion rates and complex analytical queries.
This is where ClickHouse® shines.
ClickHouse® offers a compelling alternative. It is not a dedicated time-series database, but its columnar storage, high compression, and blazing-fast aggregation capabilities make it one of the best platforms for handling time-series workloads at scale.
In this blog, we’ll explore what time-series data is, why ClickHouse® is well suited for it, how to design tables for time-series use cases, and how to write common time-series queries.
What Is Time-Series Data?
Time-series data consists of observations recorded over time, where each record contains a timestamp and one or more associated values. The timestamp is not just metadata – it is the primary dimension used to query, aggregate, and analyze the data.
Common examples:
- Application logs – error rates, request counts, response times.
- Infrastructure metrics – CPU usage, memory consumption, disk I/O.
- IoT sensor readings – temperature, pressure, humidity over time.
- Financial data – stock prices, exchange rates, transaction volumes.
- User activity – page views, clicks, session durations.
- Business metrics – daily revenue, order counts, conversion rates.
A simple example:
| Timestamp | Temperature |
|---|---|
| 2025-01-01 09:00 | 21.5 |
| 2025-01-02 10:00 | 22.1 |
| 2025-01-03 11:00 | 22.8 |
Key characteristics of time-series data:
- Data is almost always append-only – new records are added, existing ones are rarely updated
- Queries typically involve time ranges – “last 24 hours”, “this week”, “between two dates”
- Aggregations over time are the most common operation – averages, sums, counts grouped by time intervals
- Data volume is high and continuous – events arrive constantly, often in large batches
Why ClickHouse® for Time-Series Data?
Several properties of ClickHouse® make it particularly well suited for time-series workloads.
- Column-Oriented Storage
Unlike traditional row-based databases, ClickHouse® stores data by column.
For example:
server_id metric_name timestamp Value
web-01 cpu_usage 2025-01-01 10:00:00 72.5
web-01 memory_usage 2025-01-01 10:00:00 61.2
web-02 cpu_usage 2025-01-01 10:00:00 45.8
web-02 memory_usage 2025-01-01 10:00:00 78.3
db-01 cpu_usage 2025-01-01 10:00:00 30.1
For time-series queries that typically aggregate one or two columns across millions of rows, this means only the relevant columns are read from disk. As a result, I/O is dramatically reduced compared to row-based databases.
- High Ingestion Rates
Time-series workloads often generate millions of records every hour. ClickHouse® can ingest data at extremely high speeds while maintaining efficient storage and query performance.
This makes it ideal for: Monitoring systems, Logging platforms, IoT platforms, Telemetry pipelines.
- Fast Aggregations
Analytical queries typically involves: COUNT(), SUM(), AVG(), MIN(), MAX(), GROUP BY., etc..,ClickHouse® is heavily optimized for these operations.
ClickHouse® uses vectorized query execution, which means it processes data in large batches using SIMD instructions. This makes aggregations extremely fast, even over billions of rows.
Even on billions of rows, such queries can execute remarkably fast.
- Efficient Compression
Time-series data often contains repetitive values and predictable patterns. ClickHouse® leverages columnar compression to significantly reduce storage requirements while maintaining high query performance.
- Built-in Time Functions
ClickHouse® provides a rich set of time-series specific functions out of the box – toStartOfMinute(), toStartOfHour(), toStartOfDay(), dateDiff(), now(), today(), and many more. These make time-based aggregations simple and expressive.
- Horizontal Scalability
As time-series data grows, ClickHouse® scales horizontally across clusters. Data can be distributed and replicated across nodes, making it suitable for petabyte-scale time-series workloads.
Time-Series Data Modeling in ClickHouse®
A typical time-series table contains:
- Timestamp
- Entity identifier
- Metrics
Example:
CREATE TABLE metrics (
server_id LowCardinality(String),
metric_name LowCardinality(String),
timestamp DateTime,
value Float64
)
ENGINE = MergeTree()
ORDER BY (server_id, metric_name, timestamp);
Designing a Time-Series Table in ClickHouse®
The table design is critical for time-series performance in ClickHouse®. Here are the key decisions.
- Use the MergeTree Engine
The MergeTree family of table engines is the foundation of ClickHouse®. For time-series data, it is the right choice in almost every case.
- ORDER BY – The Most Important Decision
The ORDER BY clause in ClickHouse® determines how data is physically sorted on disk. For time-series tables, always include your most common filter columns first, followed by the timestamp.
-- Good: filter by server and metric, then sort by time
ORDER BY (server_id, timestamp)
-- Less optimal: sorting by time only
ORDER BY timestamp
This physically organizes data by server and time. Hence, ClickHouse® can skip entire ranges of data that don’t match, dramatically reducing the amount of data scanned.
For time-series workloads, the timestamp is almost always part of the sorting key.
- Use DateTime64 for Sub-Second Precision
If your events have millisecond or microsecond precision, use DateTime64 instead of DateTime:
timestamp DateTime64(3) -- millisecond precision
timestamp DateTime64(6) -- microsecond precision
- Use LowCardinality for Repeated String Columns
Columns like server_id, metric_name, region, or status typically have a small number of distinct values repeated many times. Wrapping them with LowCardinality applies dictionary encoding and significantly improves both compression and query speed.
- Partition by Date
Partitioning by date allows ClickHouse® to skip entire partitions when queries filter by time range. This is one of the most effective optimizations for time-series tables.
CREATE TABLE metrics (
server_id LowCardinality(String),
metric_name LowCardinality(String),
timestamp DateTime,
value Float64
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (server_id, metric_name, timestamp);
With this setup, a query filtering for “last 30 days” only reads the relevant monthly partitions everything else is skipped entirely.
Inserting Time-Series Data
ClickHouse® performs best with batch inserts rather than one row at a time. For time-series workloads, always insert data in batches of at least 1,000 rows, and ideally 10,000 or more.
INSERT INTO metrics (server_id, metric_name, timestamp, value) VALUES
('web-01', 'cpu_usage', '2025-01-01 10:00:00', 72.5),
('web-01', 'memory_usage', '2025-01-01 10:00:00', 61.2),
('web-02', 'cpu_usage', '2025-01-01 10:00:00', 45.8),
('web-02', 'memory_usage', '2025-01-01 10:00:00', 78.3),
('db-01', 'cpu_usage', '2025-01-01 10:00:00', 30.1);
In production, time-series data is typically ingested through a pipeline using tools like Kafka, Vector, Fluent Bit, or Telegraf, all of which support ClickHouse® as a destination.
Common Time-Series Queries
1. Aggregate Metrics Over a Time Range
The most common time-series query – compute average CPU usage per hour for a specific server over the last 24 hours:
SELECT
toStartOfHour(timestamp) AS hour,
avg(value) AS avg_cpu
FROM metrics
WHERE server_id = 'web-01'
AND metric_name = 'cpu_usage'
AND timestamp >= now() - INTERVAL 24 HOUR
GROUP BY hour
ORDER BY hour ASC;
2. Down sample Data by Time Interval
Downsampling reduces data resolution for dashboards or long-term trend analysis:
SELECT
toStartOfDay(timestamp) AS day,
avg(value) AS avg_cpu,
min(value) AS min_cpu,
max(value) AS max_cpu
FROM metrics
WHERE metric_name = 'cpu_usage'
AND timestamp >= now() - INTERVAL 30 DAY
GROUP BY day
ORDER BY day ASC;
3. Find Peak Values
SELECT
server_id,
max(value) AS peak_cpu,
argMax(timestamp, value) AS peak_time
FROM metrics
WHERE metric_name = 'cpu_usage'
AND timestamp >= now() - INTERVAL 7 DAY
GROUP BY server_id
ORDER BY peak_cpu DESC;
4. Count Events Per Minute
Useful for monitoring request rates, error counts, or event frequency:
SELECT
toStartOfMinute(timestamp) AS minute,
count() AS event_count
FROM metrics
WHERE server_id = 'web-01'
AND timestamp >= now() - INTERVAL 1 HOUR
GROUP BY minute
ORDER BY minute ASC;
5. Calculate the Rate of Change
Compare current values to the previous period to detect spikes or drops:
SELECT
toStartOfHour(timestamp) AS hour,
avg(value) AS avg_cpu,
avg(value) - lagInFrame(avg(value)) OVER (ORDER BY toStartOfHour(timestamp)) AS change
FROM metrics
WHERE metric_name = 'cpu_usage'
AND server_id = 'web-01'
AND timestamp >= now() - INTERVAL 12 HOUR
GROUP BY hour
ORDER BY hour ASC;
6. Working with Time Buckets
A common requirement in time-series analytics is grouping data into intervals.
ClickHouse® provides several functions for this:
toStartOfMinute()
toStartOfHour()
toStartOfDay()
toStartOfWeek()
toStartOfMonth()
Example:
SELECT
toStartOfDay(timestamp) AS day,
count() AS events
FROM metrics
GROUP BY day
ORDER BY day;
This produces daily event counts.
7. Retention and Data Lifecycle
One of the most useful features for time-series workloads is TTL (Time To Live). It allows you to automatically delete or move old data based on the timestamp.
Time-series datasets grow rapidly. To automatically remove old data, ClickHouse® supports TTL (Time To Live).
Example:
CREATE TABLE metrics
(
timestamp DateTime,
server_id String,
cpu_usage Float32
)
ENGINE = MergeTree
ORDER BY (server_id, timestamp)
TTL timestamp + INTERVAL 90 DAY;
In this example, data older than 90 days is automatically removed. This is particularly useful for log and metrics data where old records lose their value over time. As a result, storage costs are kept under control automatically without manual cleanup jobs.
Useful ClickHouse® Time Functions
ClickHouse® provides a rich set of built-in functions specifically designed for time-series analysis:
| Function | Description |
|---|---|
| now() | Returns the current date and time |
| today() | Returns the current date |
| toStartOfMinute(t) | Rounds down to the beginning of the minute |
| toStartOfHour(t) | Rounds down to the beginning of the hour |
| toStartOfDay(t) | Rounds down to the beginning of the day |
| toStartOfWeek(t) | Start of week (Monday by default) |
| toStartOfMonth(t) | Truncates to the start of the month |
| toYYYYMM(t) | Returns year and month as integer |
| toYear(t) | Extracts the year from a timestamp |
| dateDiff(‘unit’, t1, t2) | Difference between two timestamps |
| toUnixTimestamp(t) | Converts datetime to Unix timestamp |
| fromUnixTimestamp(n) | Converts Unix timestamp to datetime |
| addHours(t, n) | Adds n hours to a timestamp |
| subtractDays(t, n) | Subtracts n days from a timestamp |
ClickHouse® is not a dedicated time-series database, but for analytical workloads involving large volumes of time-stamped data, it frequently outperforms dedicated alternatives. Furthermore, its full SQL support means there is no proprietary query language to learn.
Best Practices for Time-Series in ClickHouse®
- Always include the timestamp column in the ORDER BY clause, typically as the last column after your filter dimensions.
- Partition by month or day using
toYYYYMM()ortoYYYYMMDD()to enable partition pruning on time range queries. - Use
LowCardinalityfor any string column with fewer than 10,000 distinct values. - Use
DateTime64(3)for millisecond precision instead of storing timestamps as integers. - Always insert in batches, avoid single-row inserts in production.
- Use TTL (Time To Live) to automatically expire old data and manage storage costs.
Exploring ClickHouse® for Your Analytics?
At Quantrail Data, we help teams run ClickHouse® reliably for real-time analytics – from Kubernetes deployments and migrations to performance tuning in production.
We see these challenges firsthand while supporting demanding analytics workloads. In one recent engagement, a customer achieved near bare-metal performance with ClickHouse® in production – a story we’ve shared here:
Success Story: Quantrail Bare-Metal ClickHouse® Deployment
If you’re evaluating ClickHouse® or trying to get more out of an existing setup, we’re happy to share practical lessons from real-world deployments.
Contact
Quantrail Data
Final Thoughts
ClickHouse® is not marketed as a time-series database, but in practice it handles time-series workloads exceptionally well. Its columnar storage, high compression, fast aggregations, and rich time functions make it a natural fit for any system that needs to store and query large volumes of time-stamped data.
Whether you are building an infrastructure monitoring platform, a real-time analytics dashboard, or a financial data pipeline, ClickHouse® gives you the tools to handle time-series data at scale, without the limitations of dedicated time-series databases.
