Introduction
Deleting and updating records are essential operations in transactional databases, and ClickHouse provides support for these operations.
Using the MergeTree table engine and the ALTER TABLE …
statement, you can perform these operations through mutations that run asynchronously in the background. While mutations rewrite data parts and can increase hardware usage, frequent mutations may negatively impact performance. For a more efficient approach to updates and deletes, consider using the CollapsingMergeTree table engine.
CollapsingMergeTree
The MergeTree table engine includes a variant that requires a mandatory sign column with the Int8
data type. This sign column must contain either 1
or -1
:
- Rows with a value of
1
are designated as state rows. - Rows with a value of
-1
are classified as cancel rows.
The engine effectively deletes rows that have different sign values but share the same sorting key. The collapsing logic follows a straightforward process: after sorting rows based on the specified ORDER BY
columns, the engine systematically processes each group of rows with identical sorting keys.
Collapsing Logic:
- When there is an equal number of state and cancel rows:
- If the last row is a cancel row (
-1
), all rows collapse, resulting in an empty group. - If the last row is a state row (
1
), the first cancel row and the last state row must be retained in the group.
- If the last row is a cancel row (
- When there is an unequal number of state and cancel rows:
- If there are more state rows, retain the last state row.
- If there are more cancel rows, retain the first cancel row.
Getting Started
We will create a table using the CollapsingMergeTree engine. The table will have three columns: ID
, Name
, and Sign
(the mandatory sign column). Sorting will be based on the ID
column.
Table Creation:
CREATE TABLE collapsing_mergetree
(
ID UInt64,
name String,
Sign Int8
)
ENGINE = CollapsingMergeTree(Sign)
ORDER BY ID;
Next, we insert two rows of data to demonstrate both update and deletion processes.
Insert Data:
INSERT INTO collapsing_mergetree VALUES
(22, 'a', 1), (33, 'b', 1);
Delete Operation
To delete a row, insert a copy of the row with -1
in the sign column. The CollapsingMergeTree engine will efficiently handle the operation.
INSERT INTO collapsing_mergetree (ID, Sign) VALUES (22, -1);
Only the sorting key values and the sign column need to be included.
Update Operation
Updating a record involves two steps:
- Deleting the old record based on the sorting key.
- Inserting the new record with the updated values.
Update Example:
INSERT INTO collapsing_mergetree (ID, Sign) VALUES (33, -1);
INSERT INTO collapsing_mergetree VALUES (33, 'c', 1);
To successfully execute the update operation, both deletion and insertion must occur.
Note
If the newly inserted rows do not trigger any merges, using the OPTIMIZE TABLE
statement will initialize an unscheduled merge of data parts for tables.
Conclusion
The CollapsingMergeTree engine consolidates rows that share the same sorting key but have different signs. Deletes and updates can be executed using the INSERT
operation with this engine. Additionally, these operations can also be performed through the ALTER TABLE
statement. The process of merging data rows with the same sorting key and different signs occurs in the background during merge operations.
References
https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/collapsingmergetree
Image Courtesy: Photo by Ylanite Koppens from Pexels