Engineers approaching ClickHouse® often carry mental models shaped by traditional OLTP databases. In many systems, replication is instance – wide, leadership is global, and a single primary node coordinates writes and failover.
ClickHouse® does not follow that model.
Its replication architecture is table-scoped, not instance – scoped. Coordination is granular rather than global. And ClickHouse® Keeper is a metadata coordinator – not a primary database node.
Understanding this distinction is essential for architects designing high-availability analytical systems and distributed deployments.
The Mental Model Many Engineers Bring
In many transactional databases, replication operates at the instance level:
- One primary node accepts writes
- Replicas follow by consuming transaction logs
- Leadership is cluster-wide
- Failover promotes a new primary
Replication scope is the entire database instance.
This model centralizes write coordination and simplifies transactional consistency, which is appropriate for OLTP workloads.
Because this pattern is common, it is natural to assume similar behavior in other databases.
But ClickHouse® was built for analytical workloads – and its replication model reflects different design priorities.
Understanding the ClickHouse® Table-Scoped Replication Architecture
ClickHouse® replication operates at the level of ReplicatedMergeTree tables, not entire server instances.
In a typical ClickHouse® cluster:
- Data distributes across shards
- Each shard may have multiple replicas
- Replication is defined per replicated table
- Each replicated table elects its own leader
There is no global primary server.
Writes are distributed according to sharding logic. Replication is asynchronous. Coordination occurs independently for each replicated table.
This fundamentally changes how leadership and failure domains should be understood.
Role of ClickHouse® Keeper in the Replication Architecture
ClickHouse® Keeper (or ZooKeeper in earlier deployments) acts as a coordination service.
It does not:
- Store user data
- Serve queries
- Act as a database primary
Instead, Keeper manages:
- Replication metadata
- Part tracking across replicas
- Leader election per replicated table
- Coordination of merges and mutations
Each replicated table elects a leader replica responsible for coordinating background merges and mutations for that table. Keeper facilitates this election and maintains the coordination state.
Leadership is therefore:
- Scoped per table
- Internal to replication coordination
- Independent from query routing
This is not a primary/replica hierarchy. It is a distributed coordination mechanism.
Sharding and Replication: Independent Dimensions
In ClickHouse® architecture:
- Sharding distributes data horizontally across nodes.
- Replication provides redundancy within each shard.
These are orthogonal design dimensions.
In a multi-shard deployment:
- Each shard can have multiple replicas
- Each replicated table within a shard elects its own leader
- There is no cluster-wide primary
This separation enables horizontal scalability without introducing a single global coordination bottleneck.
Architectural Implications
For architects, this replication model has meaningful consequences.
1. Failure Domains Are Granular
Because coordination is table-scoped:
- Failure of one replica does not imply global failover
- Merge leadership is localized
- Replication health is evaluated per table
This reduces blast radius compared to instance-wide primary models.
2. There Is No Global Write Leader
ClickHouse® does not depend on a single node to accept writes cluster-wide. Instead:
- Writes are routed based on shard logic
- Replication propagates data within shards
- Coordination ensures consistency of parts
This design supports large-scale analytical workloads where horizontal distribution is essential.
3. Clear Separation of Control and Data Planes
Keeper manages metadata only.
Data storage remains on ClickHouse® nodes, while coordination state resides in Keeper. This separation of control plane and data plane improves scalability and operational clarity.
Implications for Kubernetes Deployments
When running ClickHouse® on Kubernetes, it is common to misinterpret infrastructure primitives through an OLTP lens.
For example:
- A StatefulSet pod does not represent a database primary.
- Restarting a pod does not imply global failover.
- An operator does not manage write leadership.
In reality:
- Kubernetes manages container lifecycle and storage attachment.
- ClickHouse® Keeper manages replication coordination.
- Leadership remains table-scoped and internal to the database.
High availability emerges from shard replication and distributed coordination – not from promoting a single global primary.
Understanding this separation is essential when designing resilient Kubernetes-based analytical platforms.
Rethinking Replication for Analytical Systems
ClickHouse® optimizes for analytical workloads. Consequently, its replication architecture reflects:
- Immutable data parts
- Asynchronous replication
- Background merge coordination
- Horizontal shard-based scaling
Its replication architecture reflects those priorities.
Transactional systems optimize for row-level consistency and synchronous write leadership. Analytical systems optimize for distributed reads, scalable ingestion, and fault-tolerant aggregation.
Neither model is superior – they serve different workloads.
But assuming they behave the same can lead to incorrect architectural decisions.
Final Thoughts
ClickHouse® replication is table-scoped.
Coordination is granular.
ClickHouse® Keeper is not a primary node.
For architects building distributed analytical platforms, understanding these distinctions clarifies:
- Failure handling
- Write behavior
- Scaling boundaries
- Kubernetes deployment expectations
Architectural decisions should be grounded in how the system actually coordinates and replicates data – not in assumptions inherited from traditional OLTP databases.
Exploring ClickHouse® for Your Analytics?
At Quantrail Data, we help teams run ClickHouse® reliably for real-time analytics from Kubernetes deployments and migrations to performance tuning in production.
We see these challenges firsthand while supporting demanding analytics workloads. In one recent engagement, a customer achieved near bare-metal performance with ClickHouse® in production a story we’ve shared here:
Success Story: Quantrail Bare-Metal ClickHouse® Deployment
If you’re evaluating ClickHouse® or trying to get more out of an existing setup, we’re happy to share practical lessons from real-world deployments.
Contact
Quantrail Data
References
ClickHouse Replicated Table Engines
ClickHouse Keeper Overview
HA and Replication in ClickHouse
