Skip to main content
Skip to main content

Part merges

What are part merges in ClickHouse?


ClickHouse is fast not just for queries but also for inserts, thanks to its storage layer, which operates similarly to LSM trees:

① Inserts (into tables from the MergeTree engine family) create sorted, immutable data parts.

② All data processing is offloaded to background part merges.

This makes data writes lightweight and highly efficient.

To control the number of partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. per table and implement ② above, ClickHouse continuously merges (per partition) smaller partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. into larger ones in the background until they reach a compressed size of approximately ~150 GB.

The following diagram sketches this background merge process:


The merge level of a part is incremented by one with each additional merge. A level of 0 means the part is new and has not been merged yet. PartsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. that were merged into larger partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. are marked as inactive and finally deleted after a configurable time (8 minutes by default). Over time, this creates a tree of merged partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key.. Hence the name merge tree table.

Monitoring merges

In the what are table parts example, we showed that ClickHouse tracks all table partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. in the parts system table. We used the following query to retrieve the merge level and the number of stored rows per active part of the example table:

The previously documented query result shows that the example table had four active partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key., each created from a single merge of the initially inserted partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key.:

Running the query now shows that the four partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. have since merged into a single final part (as long as there are no further inserts into the table):

In ClickHouse 24.10, a new merges dashboard was added to the built-in monitoring dashboards. Available in both OSS and Cloud via the /merges HTTP handler, we can use it to visualize all part merges for our example table:

PART MERGES

The recorded dashboard above captures the entire process, from the initial data inserts to the final merge into a single part:

① Number of active partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key..

② Part merges, visually represented with boxes (size reflects part size).

Write amplification.

Concurrent merges

A single ClickHouse server uses several background merge threads to execute concurrent part merges:


Each merge thread executes a loop:

① Decide which partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. to merge next, and load these partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. into memory.

② Merge the partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. in memory into a larger part.

③ Write the merged part to disk.

Go to ①

Note that increasing the number of CPU cores and the size of RAM allows to increase the background merge throughput.

Memory optimized merges

ClickHouse does not necessarily load all partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. to be merged into memory at once, as sketched in the previous example. Based on several factors, and to reduce memory consumption (sacrificing merge speed), so-called vertical merging loads and merges partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. by chunks of blocks instead of in one go.

Merge mechanics

The diagram below illustrates how a single background merge thread in ClickHouse merges partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. (by default, without vertical merging):


The part merging is performed in several steps:

① Decompression & Loading: The compressed binary column files from the partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. to be merged are decompressed and loaded into memory.

② Merging: The data is merged into larger column files.

③ Indexing: A new sparse primary index is generated for the merged column files.

④ Compression & Storage: The new column files and index are compressed and saved in a new directory representing the merged data part.

Additional metadata in data parts, such as secondary data skipping indexes, column statistics, checksums, and min-max indexes, is also recreated based on the merged column files. We omitted these details for simplicity.

The mechanics of step ② depend on the specific MergeTree engine used, as different engines handle merging differently. For example, rows may be aggregated or replaced if outdated. As mentioned earlier, this approach offloads all data processing to background merges, enabling super-fast inserts by keeping write operations lightweight and efficient.

Next, we will briefly outline the merge mechanics of specific engines in the MergeTreeA MergeTree in ClickHouse is a table engine designed for high data ingest rates and large data volumes. It is the core storage engine in ClickHouse, providing features such as columnar storage, custom partitioning, sparse primary indexes, and support for background data merges. family.

Standard merges

The diagram below illustrates how partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. in a standard MergeTree table are merged:


The DDL statement in the diagram above creates a MergeTree table with a sorting keyIn ClickHouse, a sorting key defines the physical order of rows on disk. If you do not specify a primary key, ClickHouse uses the sorting key as the primary key. If you specify both, the primary key must be a prefix of the sorting key. (town, street), meaning data on disk is sorted by these columns, and a sparse primary index is generated accordingly.

The ① decompressed, pre-sorted table columns are ② merged while preserving the table's global sorting order defined by the table's sorting keyIn ClickHouse, a sorting key defines the physical order of rows on disk. If you do not specify a primary key, ClickHouse uses the sorting key as the primary key. If you specify both, the primary key must be a prefix of the sorting key., ③ a new sparse primary index is generated, and ④ the merged column files and index are compressed and stored as a new data part on disk.

Replacing merges

Part merges in a ReplacingMergeTree table work similarly to standard merges, but only the most recent version of each row is retained, with older versions being discarded:


The DDL statement in the diagram above creates a ReplacingMergeTree table with a sorting keyIn ClickHouse, a sorting key defines the physical order of rows on disk. If you do not specify a primary key, ClickHouse uses the sorting key as the primary key. If you specify both, the primary key must be a prefix of the sorting key. (town, street, id), meaning data on disk is sorted by these columns, with a sparse primary index generated accordingly.

The ② merging works similarly to a standard MergeTree table, combining decompressed, pre-sorted columns while preserving the global sorting order.

However, the ReplacingMergeTree removes duplicate rows with the same sorting keyIn ClickHouse, a sorting key defines the physical order of rows on disk. If you do not specify a primary key, ClickHouse uses the sorting key as the primary key. If you specify both, the primary key must be a prefix of the sorting key., keeping only the most recent row based on the creation timestamp of its containing part.


Summing merges

Numeric data is automatically summarized during merges of partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. from a SummingMergeTree table:


The DDL statement in the diagram above defines a SummingMergeTree table with town as the sorting keyIn ClickHouse, a sorting key defines the physical order of rows on disk. If you do not specify a primary key, ClickHouse uses the sorting key as the primary key. If you specify both, the primary key must be a prefix of the sorting key., meaning that data on disk is sorted by this column and a sparse primary index is created accordingly.

In the ② merging step, ClickHouse replaces all rows with the same sorting keyIn ClickHouse, a sorting key defines the physical order of rows on disk. If you do not specify a primary key, ClickHouse uses the sorting key as the primary key. If you specify both, the primary key must be a prefix of the sorting key. with a single row, summing the values of numeric columns.

Aggregating merges

The SummingMergeTree table example from above is a specialized variant of the AggregatingMergeTree table, allowing automatic incremental data transformation by applying any of 90+ aggregation functions during part merges:


The DDL statement in the diagram above creates an AggregatingMergeTree table with town as the sorting keyIn ClickHouse, a sorting key defines the physical order of rows on disk. If you do not specify a primary key, ClickHouse uses the sorting key as the primary key. If you specify both, the primary key must be a prefix of the sorting key., ensuring data is ordered by this column on disk and a corresponding sparse primary index is generated.

During ② merging, ClickHouse replaces all rows with the same sorting keyIn ClickHouse, a sorting key defines the physical order of rows on disk. If you do not specify a primary key, ClickHouse uses the sorting key as the primary key. If you specify both, the primary key must be a prefix of the sorting key. with a single row storing partial aggregation states (e.g. a sum and a count for avg()). These states ensure accurate results through incremental background merges.