, ,

Operating ClickHouse on Kubernetes

Sanjeev Kumar G avatar
Operating ClickHouse on Kubernetes

As Kubernetes becomes the default platform for deploying modern infrastructure, teams increasingly want their data systems to run there as well. Analytical workloads are no exception. Naturally, the question arises:

Can we run ClickHouse on Kubernetes in a production-grade way?

The short answer is yes.
The more accurate answer is: not safely or efficiently without additional tooling.

This article explains:

  • Why deploying ClickHouse on Kubernetes is non-trivial
  • The limitations of using StatefulSets directly
  • The operator pattern and why it exists
  • How the Altinity Kubernetes Operator for ClickHouse addresses these challenges

This part focuses on architectural reasoning. Implementation details will follow in the next article.

Scope of This Article

This article focuses specifically on the Altinity Kubernetes Operator for ClickHouse, developed by Altinity, and its approach to managing distributed ClickHouse clusters on Kubernetes.

If you are looking for a broader overview of ClickHouse on Kubernetes or alternative operator implementations, refer to the official ClickHouse documentation and related resources. you can also refer the below blog to know more about the ClickHouse’s Kubernetes operator.

The Nature of ClickHouse in Production

ClickHouse is a distributed, column-oriented database optimized for analytical workloads. In real-world deployments, it is rarely a single-node system.

A typical production setup may include:

  • Multiple shards for horizontal scaling
  • Multiple replicas per shard for availability
  • A coordination service such as ZooKeeper or ClickHouse Keeper
  • Distributed tables spanning shards
  • Background merge processes
  • Replication paths and macros that must remain consistent

This means a “ClickHouse deployment” is not simply a container with persistent storage. It is a coordinated distributed system with strict topology and configuration requirements.

Kubernetes provides primitives for running containers. It does not understand database topology.

That distinction is fundamental.

What Kubernetes Provides

Kubernetes offers several building blocks for stateful workloads:

  • Pods
  • PersistentVolumeClaims
  • StatefulSets
  • Services
  • ConfigMaps and Secrets

A StatefulSet ensures:

  • Stable network identity
  • Stable storage association
  • Ordered startup and shutdown

For many stateful services, this is sufficient.

However, Kubernetes does not provide:

  • Awareness of database shards and replicas
  • Automatic generation of cluster configuration
  • Safe orchestration of database upgrades
  • Replica coordination logic
  • Topology-aware scaling

If you deploy ClickHouse directly using StatefulSets, you are responsible for:

  • Creating and maintaining multiple StatefulSets
  • Ensuring consistent cluster configuration across nodes
  • Managing replica paths and macros
  • Handling rolling upgrades manually
  • Avoiding split-brain or replication misconfiguration
  • Coordinating topology changes

This quickly becomes operationally complex.

The Gap Between Infrastructure and Database Intent

Kubernetes operates on infrastructure-level abstractions.

You declare:

replicas: 3

Kubernetes ensures three pods are running.

But database intent is different. You may want:

  • Two shards with two replicas each
  • Replication enabled with consistent paths
  • Distributed tables across shards
  • Specific storage templates applied
  • Controlled rolling upgrades
  • Safe scaling operations

These are not native Kubernetes concepts. They represent application-specific logic.

Without additional tooling, you must manually translate database architecture into low-level Kubernetes objects. This translation layer becomes fragile and difficult to maintain as the system grows.

This is precisely the problem the operator pattern is designed to solve.

The Operator Pattern

The Kubernetes Operator pattern extends the Kubernetes control plane with application-specific knowledge.

An operator is essentially:

  • A controller running inside the cluster
  • Watching custom resources
  • Continuously reconciling desired state with actual state

Instead of managing multiple StatefulSets and Services directly, you define a higher-level custom resource that represents your database cluster.

The operator interprets this resource and generates the required Kubernetes objects automatically.

More importantly, it continues to monitor and reconcile the system over time.

This moves responsibility from manual infrastructure management to automated domain-aware control.

The Role of the Altinity ClickHouse Operator

The Altinity Kubernetes Operator for ClickHouse embeds operational knowledge about running ClickHouse inside Kubernetes.

At a high level, it provides:

Topology Modeling

You define shards and replicas declaratively.
The operator generates the corresponding StatefulSets, Services, and configuration.

You describe database architecture.
The operator implements it using Kubernetes primitives.

Configuration Management

ClickHouse nodes must share consistent cluster configuration.
The operator:

  • Generates cluster definitions
  • Ensures replica configuration is aligned
  • Manages macros and replication paths
  • Keeps configuration synchronized across nodes

This significantly reduces configuration drift.

Safe Rolling Updates

Database updates must preserve availability and consistency.

The operator orchestrates:

  • Ordered restarts
  • Readiness checks
  • Controlled rollout of new versions

This minimizes downtime and reduces the risk of cluster instability.

Storage Templates

Instead of defining storage repeatedly for each node, you define storage templates once.
The operator applies them consistently across shards and replicas.

This improves standardization and reduces configuration errors.

Continuous Reconciliation

If a pod fails, a node is rescheduled, or configuration drifts, the operator detects the discrepancy and attempts to restore the declared state.

This reconciliation loop is the core strength of the operator model.

Why Helm Alone Is Not Enough

Helm is a templating engine. It renders Kubernetes manifests and applies them.

It does not:

  • Continuously monitor cluster state
  • React to topology changes
  • Apply domain-specific reconciliation logic

For stateless services, this is often sufficient.

For distributed databases, continuous reconciliation and topology awareness are essential. That is where an operator provides tangible value beyond templating.

A Shift in Abstraction

Without an operator, you manage infrastructure objects.

With an operator, you manage database intent.

Instead of reasoning about:

  • How many StatefulSets to create
  • How to wire Services manually
  • How to synchronize configuration

You reason about:

  • How many shards you need
  • How many replicas you require
  • What storage profile applies
  • Which version should be running

This abstraction layer is what makes operating ClickHouse on Kubernetes sustainable in the long term.

Exploring ClickHouse® for Your Analytics?

At Quantrail Data, we help teams run ClickHouse® reliably for real-time analytics – from Kubernetes deployments and migrations to performance tuning in production.

We see these challenges firsthand while supporting demanding analytics workloads. In one recent engagement, a customer achieved near bare-metal performance with ClickHouse® in production – a story we’ve shared here:
Success Story: Quantrail Bare-Metal ClickHouse® Deployment

If you’re evaluating ClickHouse® or trying to get more out of an existing setup, we’re happy to share practical lessons from real-world deployments.

Contact
Quantrail Data

Conclusion

Running ClickHouse on Kubernetes is entirely feasible, but it is not a trivial StatefulSet deployment. It is a distributed system with strict topology and configuration requirements.

Kubernetes provides the necessary infrastructure primitives.
The operator provides database-specific intelligence.

In the next part of this series, we will examine the ClickHouseInstallation custom resource in detail. We will break down its structure field by field and analyze how the operator translates it into Kubernetes objects.

Understanding that translation is key to operating ClickHouse reliably in a Kubernetes environment.

References

https://kubernetes.io/docs/concepts/extend-kubernetes/operator

https://docs.altinity.com/altinitykubernetesoperator