Introduction
One of the biggest reasons organizations choose ClickHouse® for high-throughput inserts is its ability to ingest millions of rows per second while maintaining lightning-fast query performance. Whether you're collecting application logs, IoT sensor data, financial transactions, or user analytics, efficient data ingestion is essential for building a high-performance analytical platform.
However, achieving maximum insert performance isn't just about using powerful hardware. Table design, batching strategy, partitioning, compression, and background merge operations all play an important role in determining how efficiently ClickHouse handles incoming data.
In this blog, we'll explore the best practices for optimizing ClickHouse for high-throughput inserts, helping you reduce ingestion latency, minimize system overhead, and maximize overall performance.
Why Insert Performance Matters
Efficient data ingestion is critical for analytics workloads where data is continuously generated from multiple sources.
Poor insert performance can lead to:
- Increased ingestion latency
- Higher CPU and disk utilization
- Excessive background merge operations
- Large numbers of small data parts
- Reduced query performance
By optimizing insert operations, you can reduce system overhead while ensuring that newly ingested data becomes available for analysis as quickly as possible.
How ClickHouse Handles Inserts
Unlike traditional row-oriented databases, ClickHouse is optimized for batch inserts.
When data is inserted into a MergeTree table, ClickHouse performs the following steps:
- Creates a new data part.
- Sorts the data according to the primary key (
ORDER BY). - Compresses the column data.
- Writes the data part to disk.
- Background merge threads combine smaller parts into larger ones over time.
This architecture enables ClickHouse to efficiently handle large-scale data ingestion while maintaining excellent query performance.
Best Practices for High-Throughput Inserts
1. Insert Data in Large Batches
Batch inserts are one of the most effective ways to improve ingestion performance.
Small inserts generate many tiny data parts, increasing merge operations and placing unnecessary load on the system.
Less Efficient
INSERT INTO events VALUES (...);Executing this statement repeatedly for individual rows creates excessive overhead.
Recommended
INSERT INTO events FORMAT CSVLoad thousands or even millions of rows in a single insert operation.
Recommended Batch Sizes
| Rows per Insert | Performance |
|---|---|
| 1–100 | Poor |
| 1,000–10,000 | Good |
| 10,000–100,000+ | Excellent |
Larger batches reduce the number of generated parts and improve overall throughput.
2. Choose an Efficient Primary Key
The primary key (ORDER BY) determines how data is physically sorted on disk.
A well-designed primary key should:
- Match common query patterns
- Minimize unnecessary sorting
- Avoid excessively high-cardinality columns
- Support efficient filtering
Example:
ORDER BY (event_date, user_id)A carefully selected primary key improves both insert efficiency and query performance.
3. Partition Data Effectively
Partitioning organizes data into logical groups, making inserts and maintenance more efficient.
Example:
PARTITION BY toYYYYMM(event_date)Benefits include:
- Faster inserts
- Efficient partition pruning
- Simplified data retention
- Easier maintenance operations
Avoid creating too many partitions, as excessive partition counts increase metadata and merge overhead.
4. Minimize Small Data Parts
Every insert creates a new data part. Excessive small inserts lead to a large number of parts, increasing background merge activity.
Common issues include:
- High CPU utilization
- Increased disk I/O
- Longer merge times
- Slower query performance
Monitor active parts using:
SELECT
database,
table,
count() AS active_parts
FROM system.parts
WHERE active
GROUP BY database, table;If part counts continue growing, consider increasing insert batch sizes.
5. Use High-Performance Data Formats
ClickHouse supports multiple input formats for data ingestion.
Common formats include:
- Native
- Parquet
- CSV
- JSONEachRow
The Native format provides the best performance because it minimizes parsing overhead and is optimized specifically for ClickHouse.
Whenever possible, use the Native format for large-scale data ingestion.
6. Optimize Compression Codecs
Compression reduces storage requirements and improves disk efficiency.
ClickHouse supports multiple compression codecs, including:
- LZ4 (default)
- ZSTD
Example:
value String CODEC(ZSTD)- LZ4 provides faster compression and decompression.
- ZSTD offers higher compression ratios, reducing storage usage.
Choose the codec that best fits your workload and storage requirements.
7. Optimize Background Merge Operations
Background merges combine smaller data parts into larger ones, improving query performance and reducing storage fragmentation.
Monitor merge activity using:
SELECT * FROM system.merges;Frequent merge activity may indicate:
- Small insert batches
- Excessive concurrent inserts
- High ingestion rates
Using larger batches reduces merge pressure and improves system efficiency.
8. Monitor Insert Performance
Monitoring helps identify bottlenecks before they affect production workloads.
The system.query_log table provides information about recent insert operations.
Example:
SELECT
query,
query_duration_ms,
written_rows,
written_bytes
FROM system.query_log
WHERE type = 'QueryFinish'
ORDER BY event_time DESC
LIMIT 10;Regular monitoring allows you to identify slow inserts and optimize ingestion performance.
9. Enable Asynchronous Inserts
Applications that generate frequent small inserts can benefit from asynchronous inserts.
Example:
SET async_insert = 1;
SET wait_for_async_insert = 1;Advantages include:
- Automatic batching of small inserts
- Fewer generated data parts
- Reduced merge overhead
- Improved ingestion throughput
Asynchronous inserts are particularly useful for streaming applications and telemetry workloads.
10. Optimize Insert Settings
ClickHouse provides several settings that can improve insert performance depending on your workload.
Some commonly used settings include:
async_insert– Buffers small inserts before writing them to disk.wait_for_async_insert– Ensures data is written before returning success.max_insert_block_size– Controls the number of rows processed in a single insert block.max_partitions_per_insert_block– Prevents inserts from creating too many partitions.
Carefully tuning these settings can improve throughput, reduce merge overhead, and make ingestion more efficient, especially in streaming or high-volume environments.
Best Practices Checklist
- Insert data in large batches.
- Choose an efficient primary key.
- Partition data appropriately.
- Minimize the creation of small data parts.
- Use the Native format whenever possible.
- Select appropriate compression codecs.
- Monitor
system.partsandsystem.merges. - Review insert performance using
system.query_log. - Enable asynchronous inserts for frequent small writes.
- Balance concurrent insert workloads.
Conclusion
ClickHouse is capable of ingesting millions of rows per second, but achieving consistent high-throughput performance requires thoughtful configuration and efficient data-loading strategies.
By using larger batch inserts, selecting an appropriate primary key, partitioning data effectively, optimizing compression, and monitoring merge activity, you can significantly improve ingestion performance while maintaining fast analytical queries.
Whether you're building real-time dashboards, processing application logs, handling IoT telemetry, or analyzing business events, these best practices will help you maximize ClickHouse's ingestion capabilities and build scalable, high-performance analytics systems.
References
-
ClickHouse Documentation – Selecting an Insert Strategy
https://clickhouse.com/docs/best-practices/selecting-an-insert-strategy -
ClickHouse Documentation – MergeTree Table Engine
https://clickhouse.com/docs/engines/table-engines/mergetree-family/mergetree -
ClickHouse Documentation – Asynchronous Inserts
https://clickhouse.com/docs/optimize/asynchronous-inserts



