Introduction
The VersionedCollapsingMergeTree table engine extends the MergeTree engine and enhances the functionality of the CollapsingMergeTree engine. Let us look at more details about VersionedCollapsingMergeTree table engine in this article.
The CollapsingMergeTree engine collapses rows effectively only when they follow a specific order. However, inserting rows simultaneously from multiple threads can disrupt this order. The VersionedCollapsingMergeTree engine solves this problem by introducing a version column.
When creating the table, a sign column (of type Int8) is required. It can contain either 1 or -1:
- Rows with 1 are called state rows.
- Rows with -1 are called cancel rows.
The row collapsing mechanism uses both the version column and the sign column to function. This engine removes rows that have opposite sign values but share the same sorting key and version.
This engine is especially useful in scenarios that require frequent updates and deletions. Collapsing rows is far more efficient than performing updates and deletions through ALTER statements. Since frequent and extensive mutations can degrade ClickHouse performance, this engine ensures the same speed and efficiency as the MergeTree engine.
Creating a Table with VersionedCollapsingMergeTree Engine
Here’s an example of how to create and use a VersionedCollapsingMergeTree table in ClickHouse.
CREATE TABLE example_table
(
id UInt32, -- Primary key
data String, -- Some sample data
version UInt32, -- Version column
sign Int8 -- Sign column
)
ENGINE = VersionedCollapsingMergeTree(version, sign)
ORDER BY id;
Great! We insert records with matching id values but different sign values to demonstrate how collapsing works.
INSERT INTO example_table VALUES (1, 'A', 1, 1); -- State row for id = 1
INSERT INTO example_table VALUES (2, 'B', 1, 1); -- State row for id = 2
INSERT INTO example_table VALUES (1, 'A', 1, -1); -- Cancel row for id = 1 (same version)
INSERT INTO example_table VALUES (3, 'C', 2, 1); -- State row for id = 3
To verify the inserted data, execute the following query:
SELECT * FROM example_table ORDER BY id, version;
Output:
id data version sign
2 B 1 1
3 C 2 1
Since id = 1 had a state row (1, ‘A’, 1, 1) and a cancel row (1, ‘A’, 1, -1) with the same version, ClickHouse removed them.
Updating a Row with VersionedCollapsingMergeTree
we handle updates by inserting a new state row with a higher version and marking the old row with a cancel row (sign = -1
).
INSERT INTO example_table VALUES (5, 'Old Data', 1, 1);
INSERT INTO example_table VALUES (5, 'Old Data', 1, -1); -- Cancel old row
INSERT INTO example_table VALUES (5, 'New Data', 2, 1); -- Insert new state row
Verify the updated data using:
SELECT * FROM example_table ORDER BY id, version;
Output:
id data version sign
2 B 1 1
3 C 2 1
5 New Data 2 1
Note
If the newly inserted rows have not yet merged, you can execute the OPTIMIZE TABLE statement to trigger a merge manually.
Conclusion
The VersionedCollapsingMergeTree engine efficiently removes rows that have the same sorting key and version but opposite sign values. This process is significantly faster than executing mutations (ALTER
statements). Since merging occurs in the background, rows are automatically collapsed, enhancing ClickHouse’s performance without manual intervention.
References
https://clickhouse.com/docs/engines/table-engines/mergetree-family/versionedcollapsingmergetree
Image Courtesy: Photo by Nubia Navarro (nubikini) from Pexels