ClickHouse has long been a powerhouse for analytical workloads, but handling semi-structured data like JSON used to require workarounds. With the 25.3 release, the JSON data type is now stable, bringing native support for storing and querying JSON documents without extra parsing overhead.
This means faster queries, cleaner schema design, and better performance for applications dealing with semi-structured data. In Part 1 of our deep dive into the JSON type, we’ll explore how to define JSON columns, extract sub-columns, and work with nested objects efficiently.
But before you start using it, make sure JSON support is enabled:
SET enable_json_type = 1;
Defining a JSON Column
To declare a column with the JSON data type, use the following syntax:
<column_name> JSON
(
max_dynamic_paths=N,
max_dynamic_types=M,
some.path TypeName,
SKIP path.to.skip,
SKIP REGEXP 'paths_regexp'
)
Here’s what each parameter does:
max_dynamic_paths
: Limits the number of distinct JSON paths stored as sub-columns. If exceeded, additional paths are grouped together. (Default: 1024)max_dynamic_types
: Limits the number of unique data types that can exist under a single JSON key. If the limit is reached, new types are converted toString
. (Default: 32)some.path TypeName
: Specifies the expected data type for a specific JSON path. This ensures better type consistency.SKIP path.to.skip
: Excludes a specific JSON key from being stored. If it’s a nested object, the entire object is skipped.SKIP REGEXP 'paths_regexp'
: Uses a regular expression to exclude multiple JSON paths from storage.
Storing JSON Data
Let’s see how to create a table with a JSON column and insert some data.
Example: Creating and Populating a JSON Table
CREATE TABLE json_example (data JSON) ENGINE = Memory;
INSERT INTO json_example VALUES
('{"user": {"id": 101, "name": "Alice"}, "actions": ["click", "scroll", "purchase"]}'),
('{"user": {"id": 102, "name": "Bob"}, "message": "Hello, ClickHouse!"}'),
('{"user": {"id": 103, "name": "Charlie"}, "actions": ["view", "like"]}');
Now, let’s query the stored JSON:
SELECT data FROM json_example;
ClickHouse keeps the JSON structure intact while storing it efficiently under the hood.
Reading JSON Paths as Sub-Columns in ClickHouse
One of the powerful features of the JSON data type in ClickHouse is its ability to extract JSON paths as sub-columns. This means you can directly access specific fields inside JSON documents instead of parsing them manually.
If a JSON path is not explicitly defined in the column declaration, it is treated as Dynamic, meaning ClickHouse infers its type on the fly.
Defining JSON Paths as Sub-Columns
Let’s create a table where we define a specific JSON path (user.id
) as an integer sub-column, while skipping a field (user.secret
).
Example: Creating and Inserting Data
CREATE TABLE user_data (
json JSON(user.id UInt32, SKIP user.secret)
) ENGINE = Memory;
INSERT INTO user_data VALUES
('{"user": {"id": 101, "name": "Alice", "secret": "xyz"}, "status": "active"}'),
('{"user": {"id": 102}, "status": "inactive"}'),
('{"user": {"name": "Charlie"}, "status": "pending"}');
Now, let’s query the full JSON column:
SELECT json FROM user_data;
┌─json─────────────────────────────────────────┐
│ {"user":{"id":101,"name":"Alice"},"status":"active"} │
│ {"user":{"id":102},"status":"inactive"} │
│ {"user":{"name":"Charlie"},"status":"pending"} │
└────────────────────────────────────────────────┘
Notice that the user.secret
field is removed, as we specified SKIP user.secret
.
Querying JSON Paths as Sub-Columns
Now, let’s extract specific fields directly:
SELECT json.user.id, json.user.name, json.status FROM user_data;
┌─json.user.id─┬─json.user.name─┬─json.status─┐
│ 101 │ Alice │ active │
│ 102 │ ᴺᵁᴸᴸ │ inactive │
│ 0 │ Charlie │ pending │
└──────────────┴───────────────┴──────────────┘
Key Observations:
- If a requested JSON path doesn’t exist in a row, ClickHouse fills it with
NULL
. - If a numeric field is missing (
json.user.id
in the last row), ClickHouse defaults to0
.
Reading JSON Sub-Objects as Sub-Columns in ClickHouse
ClickHouse allows you to extract nested JSON objects as sub-columns, making it easier to work with deeply structured data. Instead of manually parsing nested objects, you can use the special syntax json.^path
to retrieve them directly.
However, keep in mind that reading sub-objects can be inefficient since it may require scanning the entire JSON structure.
Defining Nested JSON Sub-Columns
Let’s create a table where we store JSON with nested objects:
Example: Creating and Inserting Data
CREATE TABLE orders (
json JSON
) ENGINE = Memory;
INSERT INTO orders VALUES
('{"customer": {"id": 1, "name": "Alice"}, "order": {"id": 101, "total": 50.5}}'),
('{"customer": {"id": 2, "name": "Bob"}, "order": {"id": 102}}'),
('{"customer": {"id": 3}, "order": {"id": 103, "total": 75.0}}');
Extracting Nested JSON Objects
Now, let’s read sub-objects as separate columns:
SELECT json.^customer, json.^order FROM orders;
┌─json.^customer────────┬─json.^order───────────┐
│ {"id":1,"name":"Alice"} │ {"id":101,"total":50.5} │
│ {"id":2,"name":"Bob"} │ {"id":102} │
│ {"id":3} │ {"id":103,"total":75.0} │
└────────────────────────┴────────────────────────┘
Key Observations:
- Each nested object is stored as a sub-column instead of a single JSON blob.
- Missing fields result in empty objects (
{}
), not NULL.
While this method is convenient, reading sub-objects requires scanning the entire JSON structure, which can be slow on large datasets. If performance is a concern, consider:
- Extracting key fields as explicit sub-columns instead of full objects.
- Using
JSONExtract
functions for targeted queries.
Conclusion
The native JSON type in ClickHouse makes handling semi-structured data much easier by allowing direct storage and querying of JSON documents. Whether you’re working with flat key-value pairs or deeply nested objects, ClickHouse offers a range of powerful features to help you structure your queries efficiently.
However, as datasets grow, performance considerations become crucial. In Part 2, we’ll explore:
- How ClickHouse reads JSON from different formats (JSONEachRow, TSV, CSV, etc.)
- Handling dynamic path limits inside JSON
- Comparing JSON values and objects
- Pro tips for using the JSON type efficiently
Stay tuned, there’s much more to uncover.
References
https://clickhouse.com/docs/sql-reference/data-types/newjson
https://clickhouse.com/blog/json-bench-clickhouse-vs-mongodb-elasticsearch-duckdb-postgresql
Photo by 42 North: https://www.pexels.com/photo/tilt-shift-photography-of-birds-1275680/