,

ClickHouse® table-scoped replication architecture

Mohamed Hussain S avatar
ClickHouse® table-scoped replication architecture

Engineers approaching ClickHouse® often carry mental models shaped by traditional OLTP databases. In many systems, replication is instance – wide, leadership is global, and a single primary node coordinates writes and failover.

ClickHouse® does not follow that model.

Its replication architecture is table-scoped, not instance – scoped. Coordination is granular rather than global. And ClickHouse® Keeper is a metadata coordinator – not a primary database node.

Understanding this distinction is essential for architects designing high-availability analytical systems and distributed deployments.

In many transactional databases, replication operates at the instance level:

  • One primary node accepts writes
  • Replicas follow by consuming transaction logs
  • Leadership is cluster-wide
  • Failover promotes a new primary

Replication scope is the entire database instance.

This model centralizes write coordination and simplifies transactional consistency, which is appropriate for OLTP workloads.

Because this pattern is common, it is natural to assume similar behavior in other databases.

But ClickHouse® was built for analytical workloads – and its replication model reflects different design priorities.

ClickHouse® replication operates at the level of ReplicatedMergeTree tables, not entire server instances.

In a typical ClickHouse® cluster:

  • Data distributes across shards
  • Each shard may have multiple replicas
  • Replication is defined per replicated table
  • Each replicated table elects its own leader

There is no global primary server.

Writes are distributed according to sharding logic. Replication is asynchronous. Coordination occurs independently for each replicated table.

This fundamentally changes how leadership and failure domains should be understood.

ClickHouse® Keeper (or ZooKeeper in earlier deployments) acts as a coordination service.

It does not:

  • Store user data
  • Serve queries
  • Act as a database primary

Instead, Keeper manages:

  • Replication metadata
  • Part tracking across replicas
  • Leader election per replicated table
  • Coordination of merges and mutations

Each replicated table elects a leader replica responsible for coordinating background merges and mutations for that table. Keeper facilitates this election and maintains the coordination state.

Leadership is therefore:

  • Scoped per table
  • Internal to replication coordination
  • Independent from query routing

This is not a primary/replica hierarchy. It is a distributed coordination mechanism.

In ClickHouse® architecture:

  • Sharding distributes data horizontally across nodes.
  • Replication provides redundancy within each shard.

These are orthogonal design dimensions.

In a multi-shard deployment:

  • Each shard can have multiple replicas
  • Each replicated table within a shard elects its own leader
  • There is no cluster-wide primary

This separation enables horizontal scalability without introducing a single global coordination bottleneck.

For architects, this replication model has meaningful consequences.

1. Failure Domains Are Granular

Because coordination is table-scoped:

  • Failure of one replica does not imply global failover
  • Merge leadership is localized
  • Replication health is evaluated per table

This reduces blast radius compared to instance-wide primary models.

2. There Is No Global Write Leader

ClickHouse® does not depend on a single node to accept writes cluster-wide. Instead:

  • Writes are routed based on shard logic
  • Replication propagates data within shards
  • Coordination ensures consistency of parts

This design supports large-scale analytical workloads where horizontal distribution is essential.

3. Clear Separation of Control and Data Planes

Keeper manages metadata only.

Data storage remains on ClickHouse® nodes, while coordination state resides in Keeper. This separation of control plane and data plane improves scalability and operational clarity.

When running ClickHouse® on Kubernetes, it is common to misinterpret infrastructure primitives through an OLTP lens.

For example:

  • A StatefulSet pod does not represent a database primary.
  • Restarting a pod does not imply global failover.
  • An operator does not manage write leadership.

In reality:

  • Kubernetes manages container lifecycle and storage attachment.
  • ClickHouse® Keeper manages replication coordination.
  • Leadership remains table-scoped and internal to the database.

High availability emerges from shard replication and distributed coordination – not from promoting a single global primary.

Understanding this separation is essential when designing resilient Kubernetes-based analytical platforms.

ClickHouse® optimizes for analytical workloads. Consequently, its replication architecture reflects:

  • Immutable data parts
  • Asynchronous replication
  • Background merge coordination
  • Horizontal shard-based scaling

Its replication architecture reflects those priorities.

Transactional systems optimize for row-level consistency and synchronous write leadership. Analytical systems optimize for distributed reads, scalable ingestion, and fault-tolerant aggregation.

Neither model is superior – they serve different workloads.

But assuming they behave the same can lead to incorrect architectural decisions.

ClickHouse® replication is table-scoped.
Coordination is granular.
ClickHouse® Keeper is not a primary node.

For architects building distributed analytical platforms, understanding these distinctions clarifies:

  • Failure handling
  • Write behavior
  • Scaling boundaries
  • Kubernetes deployment expectations

Architectural decisions should be grounded in how the system actually coordinates and replicates data – not in assumptions inherited from traditional OLTP databases.

At Quantrail Data, we help teams run ClickHouse® reliably for real-time analytics from Kubernetes deployments and migrations to performance tuning in production.

We see these challenges firsthand while supporting demanding analytics workloads. In one recent engagement, a customer achieved near bare-metal performance with ClickHouse® in production a story we’ve shared here:

Success Story: Quantrail Bare-Metal ClickHouse® Deployment

If you’re evaluating ClickHouse® or trying to get more out of an existing setup, we’re happy to share practical lessons from real-world deployments.

Contact
Quantrail Data

ClickHouse Replicated Table Engines
ClickHouse Keeper Overview
HA and Replication in ClickHouse