Go is often praised for its concurrency model. You’ll hear things like “goroutines are lightweight” and “channels make communication easy.” But what does that really mean, and how does Go achieve it internally?
This post is for engineers who’ve written some Go code, maybe used go func() and channels, but want to understand what’s happening beneath the surface. We’ll take a closer look at how Go’s runtime schedules goroutines, manages stacks, and avoids the typical overhead of OS threads.
What Makes Go’s Concurrency Different?
Most languages either:
- Use OS threads directly (e.g., Java, C++)
- Or rely heavily on event loops and callbacks (e.g., JavaScript – especially in environments like Node.js)
Go takes a middle path: it introduces goroutines, which are not OS threads but instead are managed by Go’s own runtime scheduler. This is what makes Go’s concurrency model powerful.
You get the feel of threads, without the cost of creating and managing them yourself.
What Is a Goroutine, Really?
When you write:
go func()
You’re starting a goroutine, which is:
- A lightweight, user-space thread
- Scheduled by Go’s runtime, not the OS
- Backed by a dynamically growing and shrinking stack (starting at ~2KB)
Unlike traditional threads that need a large fixed-size stack (e.g., 1MB), goroutines start small and grow as needed. This allows you to spawn thousands or even millions of them without crashing your system.
The Go Scheduler: M:N Scheduling
Go uses a work-stealing M:N scheduler, where:
- M: number of OS threads
- N: number of goroutines
- The scheduler maps many goroutines (
G) to fewer OS threads (M) using logical processors (P)
Here’s how it works under the hood:
Grepresents a goroutine.Mrepresents an OS thread.Pis a logical processor, responsible for executing goroutines and managing queues.
At runtime:
- Each
Phas its own run queue of goroutines. - An
M(OS thread) is assigned to aP. - If a
P‘s queue is empty, it can steal goroutines from otherPs (work stealing). - System calls that block (e.g., file I/O) don’t block the thread; the scheduler parks the goroutine and assigns the thread to a different one.
This design ensures non-blocking behavior and keeps CPU cores utilized efficiently.
Stack Management
Each goroutine starts with a small stack (~2KB), which grows and shrinks dynamically.
Key points:
- Go avoids fixed-size stacks, which helps reduce memory footprint.
- The runtime uses a technique called segmented stacks (in older versions) and stack copying (in modern versions) to grow stacks as needed.
- This dynamic resizing allows goroutines to be lightweight and memory-efficient.
Garbage Collection and Concurrency
Go’s garbage collector (GC) is concurrent and non-blocking, which plays a critical role in keeping concurrency smooth.
As goroutines are scheduled and descheduled frequently:
- The GC must track and clean memory without long pauses.
- The latest versions of Go (1.18+) have made significant improvements in pause times, making Go more predictable under load.
Channels: Coordination, Not Parallelism
Channels in Go are a way to synchronize and communicate between goroutines, not a mechanism for parallelism itself.
Internally:
- Channels are implemented as a struct with a queue.
- Sending to or receiving from a channel involves locking (though often very fast).
- Buffered channels use ring buffers; unbuffered channels synchronize directly.
Important: channels don’t create concurrency — goroutines do. Channels just help coordinate them.
What Happens During a go func()?
Here’s what happens step-by-step when you run a goroutine:
- The compiler compiles the function into a form suitable for scheduling.
- The runtime creates a new
G(goroutine descriptor) with a small stack. - It puts the
Gon the run queue of the currentP. - The scheduler eventually picks it up and runs it on an
M(OS thread).
If the function blocks (e.g., on I/O), the scheduler detaches the M, puts the G in a waiting state, and runs a new G on the P.
Why It Matters
Understanding Go’s concurrency internals is more than trivia:
- It helps you write better code — e.g., avoid goroutine leaks.
- You can reason about performance and bottlenecks.
- It demystifies what “lightweight concurrency” actually means.
For example, if you’re running thousands of goroutines but facing latency, the issue may be in blocked channels or a lack of Ps (adjusted via GOMAXPROCS), not in the number of goroutines themselves.
Conclusion
Go’s concurrency model works because it offloads scheduling from the OS to the Go runtime. The M:N scheduler, lightweight stack management, and cooperative blocking all contribute to the performance and scalability that Go is known for.
Goroutines are not magic — they’re well-engineered, predictable, and efficient.
Looking for Expert ClickHouse Solutions?
At Quantrail, we offer a fully managed ClickHouse service, seamless migration assistance, and dedicated service contracts to help businesses optimize their analytics stack. Whether you need a hassle-free ClickHouse deployment, expert support, or help transitioning from another database, we’ve got you covered. Let’s talk about how we can accelerate your analytics!
