ClickHouse Replication Architecture: Table-Scoped Coordination

Engineers approaching ClickHouse® often carry mental models shaped by traditional OLTP databases. In many systems, replication is instance – wide, leadership is global, and a single primary node coordinates writes and failover.

ClickHouse® does not follow that model.

Its replication architecture is table-scoped, not instance – scoped. Coordination is granular rather than global. And ClickHouse® Keeper is a metadata coordinator – not a primary database node.

Understanding this distinction is essential for architects designing high-availability analytical systems and distributed deployments.

The Mental Model Many Engineers Bring

In many transactional databases, replication operates at the instance level:

One primary node accepts writes
Replicas follow by consuming transaction logs
Leadership is cluster-wide
Failover promotes a new primary

Replication scope is the entire database instance.

This model centralizes write coordination and simplifies transactional consistency, which is appropriate for OLTP workloads.

Because this pattern is common, it is natural to assume similar behavior in other databases.

But ClickHouse® was built for analytical workloads – and its replication model reflects different design priorities.

Understanding the ClickHouse® Table-Scoped Replication Architecture

ClickHouse® replication operates at the level of ReplicatedMergeTree tables, not entire server instances.

In a typical ClickHouse® cluster:

Data distributes across shards
Each shard may have multiple replicas
Replication is defined per replicated table
Each replicated table elects its own leader

There is no global primary server.

Writes are distributed according to sharding logic. Replication is asynchronous. Coordination occurs independently for each replicated table.

This fundamentally changes how leadership and failure domains should be understood.

Role of ClickHouse® Keeper in the Replication Architecture

ClickHouse® Keeper (or ZooKeeper in earlier deployments) acts as a coordination service.

It does not:

Store user data
Serve queries
Act as a database primary

Instead, Keeper manages:

Replication metadata
Part tracking across replicas
Leader election per replicated table
Coordination of merges and mutations

Each replicated table elects a leader replica responsible for coordinating background merges and mutations for that table. Keeper facilitates this election and maintains the coordination state.

Leadership is therefore:

Scoped per table
Internal to replication coordination
Independent from query routing

This is not a primary/replica hierarchy. It is a distributed coordination mechanism.

Sharding and Replication: Independent Dimensions

In ClickHouse® architecture:

Sharding distributes data horizontally across nodes.
Replication provides redundancy within each shard.

These are orthogonal design dimensions.

In a multi-shard deployment:

Each shard can have multiple replicas
Each replicated table within a shard elects its own leader
There is no cluster-wide primary

This separation enables horizontal scalability without introducing a single global coordination bottleneck.

Architectural Implications

For architects, this replication model has meaningful consequences.

1. Failure Domains Are Granular

Because coordination is table-scoped:

Failure of one replica does not imply global failover
Merge leadership is localized
Replication health is evaluated per table

This reduces blast radius compared to instance-wide primary models.

2. There Is No Global Write Leader

ClickHouse® does not depend on a single node to accept writes cluster-wide. Instead:

Writes are routed based on shard logic
Replication propagates data within shards
Coordination ensures consistency of parts

This design supports large-scale analytical workloads where horizontal distribution is essential.

3. Clear Separation of Control and Data Planes

Keeper manages metadata only.

Data storage remains on ClickHouse® nodes, while coordination state resides in Keeper. This separation of control plane and data plane improves scalability and operational clarity.

Implications for Kubernetes Deployments

When running ClickHouse® on Kubernetes, it is common to misinterpret infrastructure primitives through an OLTP lens.

For example:

A StatefulSet pod does not represent a database primary.
Restarting a pod does not imply global failover.
An operator does not manage write leadership.

In reality:

Kubernetes manages container lifecycle and storage attachment.
ClickHouse® Keeper manages replication coordination.
Leadership remains table-scoped and internal to the database.

High availability emerges from shard replication and distributed coordination – not from promoting a single global primary.

Understanding this separation is essential when designing resilient Kubernetes-based analytical platforms.

Rethinking Replication for Analytical Systems

ClickHouse® optimizes for analytical workloads. Consequently, its replication architecture reflects:

Immutable data parts
Asynchronous replication
Background merge coordination
Horizontal shard-based scaling

Its replication architecture reflects those priorities.

Transactional systems optimize for row-level consistency and synchronous write leadership. Analytical systems optimize for distributed reads, scalable ingestion, and fault-tolerant aggregation.

Neither model is superior – they serve different workloads.

But assuming they behave the same can lead to incorrect architectural decisions.

Final Thoughts

ClickHouse® replication is table-scoped.
Coordination is granular.
ClickHouse® Keeper is not a primary node.

For architects building distributed analytical platforms, understanding these distinctions clarifies:

Failure handling
Write behavior
Scaling boundaries
Kubernetes deployment expectations

Architectural decisions should be grounded in how the system actually coordinates and replicates data – not in assumptions inherited from traditional OLTP databases.

Exploring ClickHouse® for Your Analytics?

At Quantrail Data, we help teams run ClickHouse® reliably for real-time analytics from Kubernetes deployments and migrations to performance tuning in production.

We see these challenges firsthand while supporting demanding analytics workloads. In one recent engagement, a customer achieved near bare-metal performance with ClickHouse® in production a story we’ve shared here:

Success Story: Quantrail Bare-Metal ClickHouse® Deployment

If you’re evaluating ClickHouse® or trying to get more out of an existing setup, we’re happy to share practical lessons from real-world deployments.

Contact
Quantrail Data

References

ClickHouse Replicated Table Engines
ClickHouse Keeper Overview
HA and Replication in ClickHouse

Post Views: 289

ClickHouse® table-scoped replication architecture