In modern data engineering pipelines, it is common to work with columnar storage formats like Apache Parquet for efficient data storage and transmission. At the same time, ClickHouse is increasingly used for real-time analytics due to its high-performance OLAP engine.
This article demonstrates a practical approach to ingesting data from a Parquet file into a ClickHouse database using the Go programming language. We will walk through the following steps:
- Reading a Parquet file using
parquet-go - Connecting to ClickHouse using the native Go driver
- Creating a table in ClickHouse
- Inserting records programmatically
Prerequisites
To follow this guide, ensure the following Go packages are installed:
github.com/ClickHouse/clickhouse-go/v2– Go client for ClickHousegithub.com/xitongsys/parquet-go– Parquet reader for Go
Install them using:
go get github.com/ClickHouse/clickhouse-go/v2
go get github.com/xitongsys/parquet-go/reader
go get github.com/xitongsys/parquet-go-source/local
Also ensure that a ClickHouse instance is running locally or accessible via HTTP.
The Complete Go Code(with Step-by-Step Explanation)
package main
This tells Go that this is the main package – the entry point of your application. Every runnable Go program starts with package main.
import (
"database/sql"
"fmt"
"log"
_ "github.com/ClickHouse/clickhouse-go/v2"
"github.com/xitongsys/parquet-go-source/local"
"github.com/xitongsys/parquet-go/reader"
)
Let’s break this down:
"database/sql"– This is Go’s built-in database interface. It provides methods to connect to a database and run queries."fmt"– This is used to print messages to the console (for example, showing how many records were read)."log"– This is used to log errors. If something goes wrong, we uselog.Fatalfto stop the program and print the error.
Now the external packages:
"github.com/xitongsys/parquet-go/reader" – This is the actual reader that will parse Parquet files into Go structs.
_ "github.com/ClickHouse/clickhouse-go/v2" – This is the official Go driver for ClickHouse. The underscore means we are importing the package just for its side effects (its init() function registers the driver with Go’s SQL interface).
"github.com/xitongsys/parquet-go-source/local" – Allows us to read a Parquet file from the local file system.
Define the Data Model
type Car struct {
Model string `parquet:"name=model, type=BYTE_ARRAY, convertedtype=UTF8"`
MPG float64 `parquet:"name=mpg, type=DOUBLE"`
Cyl int32 `parquet:"name=cyl, type=INT32"`
Disp float64 `parquet:"name=disp, type=DOUBLE"`
HP int32 `parquet:"name=hp, type=INT32"`
Drat float64 `parquet:"name=drat, type=DOUBLE"`
Wt float64 `parquet:"name=wt, type=DOUBLE"`
Qsec float64 `parquet:"name=qsec, type=DOUBLE"`
VS int32 `parquet:"name=vs, type=INT32"`
Am int32 `parquet:"name=am, type=INT32"`
Gear int32 `parquet:"name=gear, type=INT32"`
Carb int32 `parquet:"name=carb, type=INT32"`
}
We start by creating a Go struct that matches the schema of the Parquet file. In our example, we’re using a car dataset called mtcars.
Why this step matters:
Parquet files are strongly typed. If your struct doesn’t match the file schema exactly (including type and field names), you’ll get errors or wrong data.
Main Function Starts
func main() {
Everything inside this function is the logic we want to run.
Step 1: Open the Parquet File
fr, err := local.NewLocalFileReader("/home/sanjeev/Downloads/mtcars.parquet")
if err != nil {
log.Fatalf("Can't open Parquet file: %v", err)
}
defer fr.Close()
This opens the Parquet file from your local system.
If the file path is wrong or the file is unreadable, it prints the error and stops.
defer fr.Close() ensures the file gets closed once we’re done reading.
Step 2: Create the Parquet Reader
pr, err := reader.NewParquetReader(fr, new(Car), 4)
if err != nil {
log.Fatalf("Can't create Parquet reader: %v", err)
}
defer pr.ReadStop()
This sets up a reader to convert rows in the Parquet file into Car structs.
The 4 is the number of goroutines to use when reading. It helps with performance.
defer pr.ReadStop() ensures the reader stops after use.
Step 3: Read Data From File
numRows := int(pr.GetNumRows())
fmt.Printf("Found %d rows in Parquet file\n", numRows)
batch := make([]Car, 32)
if err := pr.Read(&batch); err != nil {
log.Fatalf("Read error: %v", err)
}
fmt.Printf("Read %d records from Parquet file\n", len(batch))
GetNumRows() tells us how many rows are in the file (helpful for logging).
We create a slice called batch to hold up to 32 records.
pr.Read(&batch) fills the slice with data from the file.
Note: This reads only the first 32 rows. For real use, you would loop through all rows in chunks.
Step 4: Connect to ClickHouse
conn, err := sql.Open("clickhouse", "http://default:<YourClickHousePassword>@localhost:8124/default")
if err != nil {
log.Fatalf("ClickHouse connection failed: %v", err)
}
defer conn.Close()
This creates a connection to ClickHouse.
If ClickHouse is not running, or credentials are wrong, it will fail.
defer conn.Close() ensures the connection is closed when done.
Step 5 : Create the Table
_, err = conn.Exec(`
CREATE TABLE IF NOT EXISTS cars (
model String,
mpg Float64,
cyl Int32,
disp Float64,
hp Int32,
drat Float64,
wt Float64,
qsec Float64,
vs Int32,
am Int32,
gear Int32,
carb Int32
) ENGINE = MergeTree()
ORDER BY model
`)
if err != nil {
log.Fatalf("Create table failed: %v", err)
}
This creates the cars table if it doesn’t exist.
The schema must match both the Parquet file and Go struct.
MergeTree is the default engine for most ClickHouse tables.
Step 6 : Insert Data into ClickHouse
tx, err := conn.Begin()
if err != nil {
log.Fatalf("Transaction begin failed: %v", err)
}
stmt, err := tx.Prepare(`
INSERT INTO cars (
model, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
`)
if err != nil {
log.Fatalf("Prepare statement failed: %v", err)
}
defer stmt.Close()
for _, r := range batch {
_, err := stmt.Exec(
r.Model, r.MPG, r.Cyl, r.Disp, r.HP,
r.Drat, r.Wt, r.Qsec, r.VS, r.Am,
r.Gear, r.Carb,
)
if err != nil {
log.Fatalf("Insert failed: %v", err)
}
}
if err := tx.Commit(); err != nil {
log.Fatalf("Commit failed: %v", err)
}
fmt.Println("Parquet data ingested into ClickHouse successfully.")
We begin a transaction using conn.Begin(). A transaction is like a container that holds all our insert actions. If something goes wrong halfway, we can cancel everything so that our table doesn’t end up with incomplete or incorrect data.
We prepare an SQL insert command using tx.Prepare(...). This command defines how we want to insert data into the cars table. The ? symbols are placeholders where our actual values will go.
We use defer stmt.Close() to make sure the prepared insert statement is properly closed after we’re done. This is good practice and helps free up resources.
We loop through the batch, which contains the car records we read earlier from the Parquet file. For each record, we use stmt.Exec(...) to insert it into the table. The values we pass in must be in the same order as the table columns.
If any single insert fails during the loop, the program will stop immediately and print an error message. This helps us catch problems early.
After the loop finishes inserting all the records, we use tx.Commit() to save everything to the database. If we don’t call Commit(), none of the inserts will be saved.
Finally, we print a message saying the ingestion was successful, so we know the whole process worked without any errors.
Troubleshooting Steps
If things don’t work, here’s how you can debug the issues step by step:
1. Parquet File Not Opening
If you get the error:Can't open Parquet file: ...
- Check that the file path is correct.
- Make sure the file exists and has read permissions.
- Verify that the file is actually a Parquet file (not corrupted).
2. Parquet Reader Fails to Create
If the error says:Can't create Parquet reader: ...
- Double-check that your struct fields match the schema of the Parquet file.
- Use a tool like
parquet-tools schema yourfile.parquetto inspect the file’s schema. - Make sure the field names and types (like
Float64,Int32,String) are exactly right.
3. ClickHouse Connection Fails
If you see:ClickHouse connection failed: ...
- Is ClickHouse running? Check with
clickhouse-clientor try openinghttp://localhost:8123in your browser. - Are the credentials correct? The default user is
defaultwith an empty password. - Is the port correct? ClickHouse typically listens on
8123for HTTP and9000for TCP.
4. Insert Statement Fails
If the program fails during insert:
- Make sure the table schema in ClickHouse matches your struct exactly.
- Check the order of values in the
INSERTstatement – they must match the columns. - Look at the error message – it often tells you which field caused the problem.
5. Data Not Visible in ClickHouse
If the program runs but you see no data in the table:
- Run
SELECT count(*) FROM carsin ClickHouse to verify. - Check that the transaction was committed (
tx.Commit()was not skipped). - Confirm that you are connecting to the same database as you’re querying from.
Conclusion
This code is a great starting point for any workflow where you need to bring Parquet data into ClickHouse. Once you understand the flow – define a struct, read a file, connect, create table, insert data – you can extend it to handle larger files, use cloud storage, or support multiple files in a batch.
Looking for Expert ClickHouse Solutions?
At Quantrail, we offer a fully managed ClickHouse service, seamless migration assistance, and dedicated service contracts to help businesses optimize their analytics stack. Whether you need a hassle-free ClickHouse deployment, expert support, or help transitioning from another database, we’ve got you covered. Let’s talk about how we can accelerate your analytics!
References
https://www.tablab.app/parquet/sample
https://github.com/xitongsys/parquet-go
