ClickHouse : Introduction to VersionedCollapsingMergeTree

Introduction

The VersionedCollapsingMergeTree table engine extends the MergeTree engine and enhances the functionality of the CollapsingMergeTree engine. Let us look at more details about VersionedCollapsingMergeTree table engine in this article.

The CollapsingMergeTree engine collapses rows effectively only when they follow a specific order. However, inserting rows simultaneously from multiple threads can disrupt this order. The VersionedCollapsingMergeTree engine solves this problem by introducing a version column.

When creating the table, a sign column (of type Int8) is required. It can contain either 1 or -1:

  • Rows with 1 are called state rows.
  • Rows with -1 are called cancel rows.

The row collapsing mechanism uses both the version column and the sign column to function. This engine removes rows that have opposite sign values but share the same sorting key and version.

This engine is especially useful in scenarios that require frequent updates and deletions. Collapsing rows is far more efficient than performing updates and deletions through ALTER statements. Since frequent and extensive mutations can degrade ClickHouse performance, this engine ensures the same speed and efficiency as the MergeTree engine.

Creating a Table with VersionedCollapsingMergeTree Engine

Here’s an example of how to create and use a VersionedCollapsingMergeTree table in ClickHouse.

CREATE TABLE example_table
(
    id UInt32,                -- Primary key 
    data String,              -- Some sample data
    version UInt32,           -- Version column 
    sign Int8                 -- Sign column 
)
ENGINE = VersionedCollapsingMergeTree(version, sign)
ORDER BY id;

Great! We insert records with matching id values but different sign values to demonstrate how collapsing works.

INSERT INTO example_table VALUES (1, 'A', 1, 1);  -- State row for id = 1
INSERT INTO example_table VALUES (2, 'B', 1, 1);  -- State row for id = 2
INSERT INTO example_table VALUES (1, 'A', 1, -1); -- Cancel row for id = 1 (same version)
INSERT INTO example_table VALUES (3, 'C', 2, 1);  -- State row for id = 3

To verify the inserted data, execute the following query:

SELECT * FROM example_table ORDER BY id, version;

Output:


id	data	version	sign
2	B	1	1
3	C	2	1

Since id = 1 had a state row (1, ‘A’, 1, 1) and a cancel row (1, ‘A’, 1, -1) with the same version, ClickHouse removed them.

Updating a Row with VersionedCollapsingMergeTree

we handle updates by inserting a new state row with a higher version and marking the old row with a cancel row (sign = -1).

INSERT INTO example_table VALUES (5, 'Old Data', 1, 1);
INSERT INTO example_table VALUES (5, 'Old Data', 1, -1);  -- Cancel old row
INSERT INTO example_table VALUES (5, 'New Data', 2, 1);   -- Insert new state row

Verify the updated data using:

SELECT * FROM example_table ORDER BY id, version;

Output:

id	data	version	sign
2	B	1	1
3	C	2	1
5	New Data	2	1

Note

If the newly inserted rows have not yet merged, you can execute the OPTIMIZE TABLE statement to trigger a merge manually.

Conclusion

The VersionedCollapsingMergeTree engine efficiently removes rows that have the same sorting key and version but opposite sign values. This process is significantly faster than executing mutations (ALTER statements). Since merging occurs in the background, rows are automatically collapsed, enhancing ClickHouse’s performance without manual intervention.

References

https://clickhouse.com/docs/engines/table-engines/mergetree-family/versionedcollapsingmergetree

Image Courtesy: Photo by Nubia Navarro (nubikini) from Pexels