Go Profiling
Introduction
Performance profiling is an essential skill for any Go developer who wants to build efficient applications. Go includes a powerful built-in profiling toolkit called pprof
that helps you identify bottlenecks, memory leaks, and other performance issues in your programs.
In this tutorial, we'll explore Go's profiling capabilities, understand how to use the different profiling tools, and learn how to interpret the results to improve your code's performance.
What is Profiling?
Profiling is the process of analyzing a program's behavior during execution to identify which parts consume the most resources. Instead of guessing where performance problems might be, profiling provides concrete data about:
- CPU usage (which functions take the most time?)
- Memory allocations (which parts of your code allocate the most memory?)
- Blocking operations (where is your code waiting for operations to complete?)
- Goroutine activity (are you creating too many goroutines or experiencing deadlocks?)
Let's get started with Go's profiling tools and see them in action!
Go's Profiling Toolkit: pprof
Go's primary profiling tool is pprof
, which is part of the standard library in the runtime/pprof
package. Additionally, the net/http/pprof
package provides HTTP endpoints for profiling running web applications.
Types of Profiles
Go supports several types of profiles:
- CPU Profile: Shows where your program spends its CPU time
- Memory Profile: Shows memory allocation patterns
- Block Profile: Shows where goroutines block waiting for synchronization primitives
- Goroutine Profile: Shows all currently running goroutines
- Mutex Profile: Shows contention on mutexes
Setting Up Basic Profiling
Let's start with a simple example program that has some performance issues:
package main
import (
"fmt"
"os"
"runtime/pprof"
"strings"
)
func main() {
// Create CPU profile file
f, err := os.Create("cpu.prof")
if err != nil {
fmt.Printf("Could not create CPU profile: %v
", err)
return
}
defer f.Close()
// Start CPU profiling
if err := pprof.StartCPUProfile(f); err != nil {
fmt.Printf("Could not start CPU profile: %v
", err)
return
}
defer pprof.StopCPUProfile()
// Our inefficient program
result := inefficientJoin(50000)
fmt.Printf("Result length: %d
", len(result))
// Create memory profile
f2, err := os.Create("mem.prof")
if err != nil {
fmt.Printf("Could not create memory profile: %v
", err)
return
}
defer f2.Close()
// Write memory profile
if err := pprof.WriteHeapProfile(f2); err != nil {
fmt.Printf("Could not write memory profile: %v
", err)
return
}
fmt.Println("CPU and memory profiles created")
}
// This function is intentionally inefficient
func inefficientJoin(n int) string {
result := ""
for i := 0; i < n; i++ {
// Inefficient string concatenation
result += fmt.Sprintf("number-%d,", i)
}
return result
}
When we run this program, it will create two profile files:
cpu.prof
: Records CPU usagemem.prof
: Records memory allocations
Output:
Result length: 499995
CPU and memory profiles created
Analyzing Profiles with pprof
Let's analyze the profiles we've created. You can use the go tool pprof
command to analyze profiles:
go tool pprof cpu.prof
This opens an interactive console where you can explore the profile:
Type: cpu
Time: Mar 2, 2023, 15:04:05
Duration: 200ms, Total samples = 180ms (90.00%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 170ms, 94.44% of 180ms total
Showing top 10 nodes out of 25
flat flat% sum% cum cum%
80ms 44.44% 44.44% 80ms 44.44% runtime.concatstrings
40ms 22.22% 66.67% 130ms 72.22% main.inefficientJoin
30ms 16.67% 83.33% 30ms 16.67% runtime.mallocgc
10ms 5.56% 88.89% 10ms 5.56% fmt.Sprintf
10ms 5.56% 94.44% 10ms 5.56% runtime.convTstring
0 0% 94.44% 10ms 5.56% fmt.(*pp).doPrintf
0 0% 94.44% 10ms 5.56% fmt.(*pp).printArg
0 0% 94.44% 10ms 5.56% fmt.Fprintf
0 0% 94.44% 10ms 5.56% os.(*File).Write
0 0% 94.44% 10ms 5.56% runtime.convT2E
We can also visualize the profile as a graph by running:
go tool pprof -http=:8080 cpu.prof
This opens a web browser with an interactive visualization:
Interpreting Profile Results
Looking at our CPU profile, we can see:
concatstrings
andinefficientJoin
take up most of the CPU time (66.67%)- Memory allocation (
mallocgc
) takes 16.67% fmt.Sprintf
also consumes resources
The problem is clear: string concatenation in a loop is inefficient in Go because strings are immutable, so each concatenation creates a new string.
Optimizing Based on Profile Results
Let's improve our code based on the profiling results:
// Improved version using strings.Builder
func efficientJoin(n int) string {
var builder strings.Builder
// Pre-allocate approximate capacity
builder.Grow(n * 10)
for i := 0; i < n; i++ {
fmt.Fprintf(&builder, "number-%d,", i)
}
return builder.String()
}
Now let's run a benchmark comparing both functions:
package main
import (
"fmt"
"strings"
"testing"
)
func BenchmarkInefficientJoin(b *testing.B) {
for i := 0; i < b.N; i++ {
inefficientJoin(5000)
}
}
func BenchmarkEfficientJoin(b *testing.B) {
for i := 0; i < b.N; i++ {
efficientJoin(5000)
}
}
Run the benchmark:
go test -bench=. -benchmem
Sample output:
BenchmarkInefficientJoin-8 3 429876733 ns/op 238529064 B/op 10002 allocs/op
BenchmarkEfficientJoin-8 88 13587499 ns/op 51312 B/op 8 allocs/op
The optimized version:
- Is approximately 32x faster
- Uses 4,600x less memory
- Performs 1,250x fewer allocations
Profiling Web Applications
For web applications, Go provides the net/http/pprof
package, which adds HTTP endpoints for profiling a running server:
package main
import (
"fmt"
"log"
"net/http"
_ "net/http/pprof" // Import for side effects
"strings"
)
func heavyHandler(w http.ResponseWriter, r *http.Request) {
// Simulate expensive operation
result := strings.Builder{}
for i := 0; i < 10000; i++ {
result.WriteString(fmt.Sprintf("number-%d,", i))
}
fmt.Fprintf(w, "Done: length=%d", result.Len())
}
func main() {
http.HandleFunc("/heavy", heavyHandler)
log.Println("Server starting on :8080")
log.Println("Access profiling data at /debug/pprof/")
log.Fatal(http.ListenAndServe(":8080", nil))
}
Now you can access different profiling endpoints:
http://localhost:8080/debug/pprof/
- Index pagehttp://localhost:8080/debug/pprof/heap
- Memory profilehttp://localhost:8080/debug/pprof/profile
- 30-second CPU profilehttp://localhost:8080/debug/pprof/goroutine
- Goroutine stack traces
To analyze a running web application:
go tool pprof http://localhost:8080/debug/pprof/profile
Advanced Profiling: Tracing
For more detailed insights, Go provides execution tracing:
package main
import (
"fmt"
"os"
"runtime/trace"
)
func main() {
f, err := os.Create("trace.out")
if err != nil {
panic(err)
}
defer f.Close()
err = trace.Start(f)
if err != nil {
panic(err)
}
defer trace.Stop()
// Your code here
fmt.Println("Tracing in progress...")
// Example workload
for i := 0; i < 10; i++ {
go func(n int) {
total := 0
for j := 0; j < 1000000; j++ {
total += j
}
fmt.Printf("Worker %d: %d
", n, total)
}(i)
}
// Wait a bit for goroutines to finish
fmt.Scanln()
}
Run the trace viewer:
go tool trace trace.out
This opens a web interface showing:
- Goroutine execution
- Network/sync blocking
- System calls
- Garbage collection events
Continuous Profiling
For production applications, consider implementing continuous profiling:
- Periodically capture profiles
- Store profiles with timestamps
- Compare profiles over time to detect regressions
Popular tools for continuous profiling:
- Google Cloud Profiler
- Datadog Continuous Profiler
- Pyroscope
- Parca
Best Practices
- Profile early and often: Don't wait until you have performance problems
- Focus on hot spots: Address the most significant issues first
- Benchmark before and after: Verify your optimizations actually help
- Consider the full picture: Sometimes CPU improvements come at the cost of memory usage
- Profile in realistic environments: Development machines may not show the same patterns as production
Memory Management Tips
Based on profiling, here are common memory optimization patterns:
- Pre-allocate slices when you know the approximate size
- Use object pools for frequently allocated/deallocated objects
- Watch for hidden allocations in interface conversions, string operations, etc.
- Use value types instead of pointers when appropriate
- Be mindful of closure captures which can prevent garbage collection
Summary
Go's profiling tools provide powerful insights into your application's performance:
- CPU profiling reveals computation bottlenecks
- Memory profiling shows allocation patterns and potential leaks
- Block profiling identifies concurrency issues
- Trace tool visualizes goroutine behavior and scheduling
By learning to use these tools effectively, you can make data-driven optimization decisions rather than relying on intuition or guesswork.
Additional Resources
- Go Blog: Profiling Go Programs
- Runtime pprof package documentation
- Dave Cheney: High Performance Go Workshop
- Practical Go Benchmarks repository
Exercises
- Use pprof to identify and fix performance issues in a simple program that sorts a large slice of integers
- Set up continuous profiling for a web server and monitor it under load
- Compare the performance of different data structures (map, slice, etc.) for your specific use case
- Use block profiling to identify contention issues in a concurrent program
- Profile a program before and after applying the optimization techniques described in this guide
Remember, profiling should be part of your regular development workflow, not just something you do when problems arise. Happy profiling!
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)