,

Scaling ClickHouse with Sharding: When One Node Isn’t Enough

Mohamed Hussain S avatar
Scaling ClickHouse with Sharding: When One Node Isn’t Enough

As datasets grow, even the most powerful ClickHouse node will eventually reach its storage or CPU limits.
The good news is that ClickHouse’s sharding features make it possible to scale horizontally without sacrificing speed or stability.

In this post, we’ll explore how sharding works, when it’s worth using, and how to add new nodes when your current server is maxed out – complete with practical examples.


While ClickHouse is extremely efficient, a single server still has physical limits:

  • Disk Space → When storage reaches 90% capacity, large merges may fail.
  • CPU Saturation → High query concurrency can slow down response times.
  • Memory Pressure → Large aggregations or joins might exceed RAM availability.

When scaling vertically (buying a bigger machine) isn’t cost-effective, sharding – splitting data across multiple servers – becomes the smarter option.


A shard is simply a partition of your dataset stored on a specific node.
When you query a Distributed table, ClickHouse automatically sends the query to all shards and merges the results before returning them.

For instance, here’s a minimal cluster configuration:

<yandex>
  <remote_servers>
    <my_cluster>
      <shard>
        <weight>1</weight>
        <replica>
          <host>clickhouse-shard1</host>
          <port>9000</port>
        </replica>
      </shard>
      <shard>
        <weight>10</weight>
        <replica>
          <host>clickhouse-shard2</host>
          <port>9000</port>
        </replica>
      </shard>
    </my_cluster>
  </remote_servers>
</yandex>

Here:

  • Shard 1 → weight 1
  • Shard 2 → weight 10 (receives ~10× more data)

If Shard 1 is running out of disk space:

  1. Provision a new ClickHouse node (same version & configs).
  2. Update the cluster config to add the new shard.
  3. Create matching tables on the new shard.
  4. Adjust shard weights to route more inserts to the new server.

Example – new shard config:

<shard>
  <weight>5</weight>
  <replica>
    <host>clickhouse-shard3</host>
    <port>9000</port>
  </replica>
</shard>

Let’s say we have a Distributed table setup:

CREATE TABLE shard_local
(
    id UInt64,
    name String
)
ENGINE = MergeTree()
ORDER BY id;

CREATE TABLE shard_dist AS shard_local
ENGINE = Distributed(my_cluster, default, shard_local, rand());

Insert 10,000 rows:

INSERT INTO shard_dist
SELECT number, concat('name_', toString(number))
FROM numbers(10000);

Because of the weight difference (1 vs 10), Shard 2 will get far more rows than Shard 1.
You can verify distribution with:

SELECT * FROM shard_local;

(on each shard)

  • No downtime – Add shards without stopping queries.
  • Linear scalability – More shards mean more storage and CPU.
  • Cost-efficient – Scale horizontally instead of buying one huge server.

Adding a new shard doesn’t automatically move old data – new inserts follow the updated weights.
If you want to redistribute existing data:

  • Use INSERT SELECT from old shards to the Distributed table.
  • Or run ALTER TABLE ... MOVE PART (manual balancing).

Sharding is one of ClickHouse’s most powerful scaling tools.
By planning for horizontal growth early, you can avoid last-minute migrations when storage or CPU limits hit.

Whether you’re working with billions of rows in analytics, IoT time-series, or real-time event processing, knowing how to add and balance shards ensures your ClickHouse cluster stays ready for future growth.

At Quantrail Data, we help teams:

  • Design scalable ClickHouse architectures
  • Optimize queries for distributed clusters
  • Manage data migration & rebalancing with minimal downtime

Let’s make your ClickHouse cluster future-proof – Get in touch.

ClickHouse Official Documentation – Sharding

ClickHouse Cluster Setup Guide

ClickHouse Sharding and Replication