How Go Handles Concurrency Under the Hood: Goroutines, Scheduler, and Channels Explained

Go is often praised for its concurrency model. You’ll hear things like “goroutines are lightweight” and “channels make communication easy.” But what does that really mean, and how does Go achieve it internally?

This post is for engineers who’ve written some Go code, maybe used go func() and channels, but want to understand what’s happening beneath the surface. We’ll take a closer look at how Go’s runtime schedules goroutines, manages stacks, and avoids the typical overhead of OS threads.

What Makes Go’s Concurrency Different?

Most languages either:

Use OS threads directly (e.g., Java, C++)
Or rely heavily on event loops and callbacks (e.g., JavaScript – especially in environments like Node.js)

Go takes a middle path: it introduces goroutines, which are not OS threads but instead are managed by Go’s own runtime scheduler. This is what makes Go’s concurrency model powerful.

You get the feel of threads, without the cost of creating and managing them yourself.

What Is a Goroutine, Really?

When you write:

go func()

You’re starting a goroutine, which is:

A lightweight, user-space thread
Scheduled by Go’s runtime, not the OS
Backed by a dynamically growing and shrinking stack (starting at ~2KB)

Unlike traditional threads that need a large fixed-size stack (e.g., 1MB), goroutines start small and grow as needed. This allows you to spawn thousands or even millions of them without crashing your system.

The Go Scheduler: M:N Scheduling

Go uses a work-stealing M:N scheduler, where:

M: number of OS threads
N: number of goroutines
The scheduler maps many goroutines (G) to fewer OS threads (M) using logical processors (P)

Here’s how it works under the hood:

G represents a goroutine.
M represents an OS thread.
P is a logical processor, responsible for executing goroutines and managing queues.

At runtime:

Each P has its own run queue of goroutines.
An M (OS thread) is assigned to a P.
If a P‘s queue is empty, it can steal goroutines from other Ps (work stealing).
System calls that block (e.g., file I/O) don’t block the thread; the scheduler parks the goroutine and assigns the thread to a different one.

This design ensures non-blocking behavior and keeps CPU cores utilized efficiently.

Stack Management

Each goroutine starts with a small stack (~2KB), which grows and shrinks dynamically.

Key points:

Go avoids fixed-size stacks, which helps reduce memory footprint.
The runtime uses a technique called segmented stacks (in older versions) and stack copying (in modern versions) to grow stacks as needed.
This dynamic resizing allows goroutines to be lightweight and memory-efficient.

Garbage Collection and Concurrency

Go’s garbage collector (GC) is concurrent and non-blocking, which plays a critical role in keeping concurrency smooth.

As goroutines are scheduled and descheduled frequently:

The GC must track and clean memory without long pauses.
The latest versions of Go (1.18+) have made significant improvements in pause times, making Go more predictable under load.

Channels: Coordination, Not Parallelism

Channels in Go are a way to synchronize and communicate between goroutines, not a mechanism for parallelism itself.

Internally:

Channels are implemented as a struct with a queue.
Sending to or receiving from a channel involves locking (though often very fast).
Buffered channels use ring buffers; unbuffered channels synchronize directly.

Important: channels don’t create concurrency — goroutines do. Channels just help coordinate them.

What Happens During a `go func()`?

Here’s what happens step-by-step when you run a goroutine:

The compiler compiles the function into a form suitable for scheduling.
The runtime creates a new G (goroutine descriptor) with a small stack.
It puts the G on the run queue of the current P.
The scheduler eventually picks it up and runs it on an M (OS thread).

If the function blocks (e.g., on I/O), the scheduler detaches the M, puts the G in a waiting state, and runs a new G on the P.

Why It Matters

Understanding Go’s concurrency internals is more than trivia:

It helps you write better code — e.g., avoid goroutine leaks.
You can reason about performance and bottlenecks.
It demystifies what “lightweight concurrency” actually means.

For example, if you’re running thousands of goroutines but facing latency, the issue may be in blocked channels or a lack of Ps (adjusted via GOMAXPROCS), not in the number of goroutines themselves.

Conclusion

Go’s concurrency model works because it offloads scheduling from the OS to the Go runtime. The M:N scheduler, lightweight stack management, and cooperative blocking all contribute to the performance and scalability that Go is known for.

Goroutines are not magic — they’re well-engineered, predictable, and efficient.

Looking for Expert ClickHouse Solutions?

At Quantrail, we offer a fully managed ClickHouse service, seamless migration assistance, and dedicated service contracts to help businesses optimize their analytics stack. Whether you need a hassle-free ClickHouse deployment, expert support, or help transitioning from another database, we’ve got you covered. Let’s talk about how we can accelerate your analytics!

Contact

Quantrail Data

References

Post Views: 665