JSON Datatype in ClickHouse – Part 1

ClickHouse has long been a powerhouse for analytical workloads, but handling semi-structured data like JSON used to require workarounds. With the 25.3 release, the JSON data type is now stable, bringing native support for storing and querying JSON documents without extra parsing overhead.

This means faster queries, cleaner schema design, and better performance for applications dealing with semi-structured data. In Part 1 of our deep dive into the JSON type, we’ll explore how to define JSON columns, extract sub-columns, and work with nested objects efficiently.

But before you start using it, make sure JSON support is enabled:

SET enable_json_type = 1;

Defining a JSON Column

To declare a column with the JSON data type, use the following syntax:

<column_name> JSON
(
    max_dynamic_paths=N, 
    max_dynamic_types=M, 
    some.path TypeName, 
    SKIP path.to.skip, 
    SKIP REGEXP 'paths_regexp'
)

Here’s what each parameter does:

  • max_dynamic_paths: Limits the number of distinct JSON paths stored as sub-columns. If exceeded, additional paths are grouped together. (Default: 1024)
  • max_dynamic_types: Limits the number of unique data types that can exist under a single JSON key. If the limit is reached, new types are converted to String. (Default: 32)
  • some.path TypeName: Specifies the expected data type for a specific JSON path. This ensures better type consistency.
  • SKIP path.to.skip: Excludes a specific JSON key from being stored. If it’s a nested object, the entire object is skipped.
  • SKIP REGEXP 'paths_regexp': Uses a regular expression to exclude multiple JSON paths from storage.

Storing JSON Data

Let’s see how to create a table with a JSON column and insert some data.

Example: Creating and Populating a JSON Table

CREATE TABLE json_example (data JSON) ENGINE = Memory;

INSERT INTO json_example VALUES 
    ('{"user": {"id": 101, "name": "Alice"}, "actions": ["click", "scroll", "purchase"]}'), 
    ('{"user": {"id": 102, "name": "Bob"}, "message": "Hello, ClickHouse!"}'), 
    ('{"user": {"id": 103, "name": "Charlie"}, "actions": ["view", "like"]}');

Now, let’s query the stored JSON:

SELECT data FROM json_example;

ClickHouse keeps the JSON structure intact while storing it efficiently under the hood.

Reading JSON Paths as Sub-Columns in ClickHouse

One of the powerful features of the JSON data type in ClickHouse is its ability to extract JSON paths as sub-columns. This means you can directly access specific fields inside JSON documents instead of parsing them manually.

If a JSON path is not explicitly defined in the column declaration, it is treated as Dynamic, meaning ClickHouse infers its type on the fly.

Defining JSON Paths as Sub-Columns

Let’s create a table where we define a specific JSON path (user.id) as an integer sub-column, while skipping a field (user.secret).

Example: Creating and Inserting Data

CREATE TABLE user_data (
    json JSON(user.id UInt32, SKIP user.secret)
) ENGINE = Memory;

INSERT INTO user_data VALUES 
    ('{"user": {"id": 101, "name": "Alice", "secret": "xyz"}, "status": "active"}'),
    ('{"user": {"id": 102}, "status": "inactive"}'),
    ('{"user": {"name": "Charlie"}, "status": "pending"}');

Now, let’s query the full JSON column:

SELECT json FROM user_data;
┌─json─────────────────────────────────────────┐
│ {"user":{"id":101,"name":"Alice"},"status":"active"} │
│ {"user":{"id":102},"status":"inactive"}              │
│ {"user":{"name":"Charlie"},"status":"pending"}       │
└────────────────────────────────────────────────┘

Notice that the user.secret field is removed, as we specified SKIP user.secret.

Querying JSON Paths as Sub-Columns

Now, let’s extract specific fields directly:

SELECT json.user.id, json.user.name, json.status FROM user_data;
┌─json.user.id─┬─json.user.name─┬─json.status─┐
│         101  │ Alice          │ active      │
│         102  │ ᴺᵁᴸᴸ          │ inactive    │
│           0  │ Charlie        │ pending     │
└──────────────┴───────────────┴──────────────┘

Key Observations:

  • If a requested JSON path doesn’t exist in a row, ClickHouse fills it with NULL.
  • If a numeric field is missing (json.user.id in the last row), ClickHouse defaults to 0.

Reading JSON Sub-Objects as Sub-Columns in ClickHouse

ClickHouse allows you to extract nested JSON objects as sub-columns, making it easier to work with deeply structured data. Instead of manually parsing nested objects, you can use the special syntax json.^path to retrieve them directly.

However, keep in mind that reading sub-objects can be inefficient since it may require scanning the entire JSON structure.

Defining Nested JSON Sub-Columns

Let’s create a table where we store JSON with nested objects:

Example: Creating and Inserting Data

CREATE TABLE orders (
    json JSON
) ENGINE = Memory;

INSERT INTO orders VALUES 
    ('{"customer": {"id": 1, "name": "Alice"}, "order": {"id": 101, "total": 50.5}}'),
    ('{"customer": {"id": 2, "name": "Bob"}, "order": {"id": 102}}'),
    ('{"customer": {"id": 3}, "order": {"id": 103, "total": 75.0}}');

Extracting Nested JSON Objects

Now, let’s read sub-objects as separate columns:

SELECT json.^customer, json.^order FROM orders;
┌─json.^customer────────┬─json.^order───────────┐
│ {"id":1,"name":"Alice"} │ {"id":101,"total":50.5} │
│ {"id":2,"name":"Bob"}   │ {"id":102}              │
│ {"id":3}               │ {"id":103,"total":75.0} │
└────────────────────────┴────────────────────────┘

Key Observations:

  • Each nested object is stored as a sub-column instead of a single JSON blob.
  • Missing fields result in empty objects ({}), not NULL.

While this method is convenient, reading sub-objects requires scanning the entire JSON structure, which can be slow on large datasets. If performance is a concern, consider:

  • Extracting key fields as explicit sub-columns instead of full objects.
  • Using JSONExtract functions for targeted queries.

Conclusion

The native JSON type in ClickHouse makes handling semi-structured data much easier by allowing direct storage and querying of JSON documents. Whether you’re working with flat key-value pairs or deeply nested objects, ClickHouse offers a range of powerful features to help you structure your queries efficiently.

However, as datasets grow, performance considerations become crucial. In Part 2, we’ll explore:

  • How ClickHouse reads JSON from different formats (JSONEachRow, TSV, CSV, etc.)
  • Handling dynamic path limits inside JSON
  • Comparing JSON values and objects
  • Pro tips for using the JSON type efficiently

Stay tuned, there’s much more to uncover.

References

https://clickhouse.com/docs/sql-reference/data-types/newjson

https://clickhouse.com/blog/json-bench-clickhouse-vs-mongodb-elasticsearch-duckdb-postgresql

Photo by 42 North: https://www.pexels.com/photo/tilt-shift-photography-of-birds-1275680/