I recently joined a project where the codebase was written in Go. While the codebase had numerous issues, one stood out: the server restarted unexpectedly and frequently. After examining the server logs, I discovered the culprit—panic-induced crashes in Goroutines. The underlying issue was that the codebase didn’t handle panic recovery in all the Goroutines it started.
When writing APIs in Go, it’s common knowledge that a recover middleware is essential for properly handling panics. Each incoming request spawns a new Goroutine, and middleware ensures that panics during request handling don’t crash the entire application.
However, the HTTP multiplexer isn’t the only source of Goroutines. Developers often create Goroutines manually, whether for running background jobs, handling asynchronous tasks, or performing intensive operations. Unfortunately, these Goroutines can panic too—and if they do, the entire application can crash.
Solution
The solution is straightforward: you must handle panic recovery in all Goroutines you create. A helper function can simplify this process by centralizing panic recovery and logging. Below is an example of such a function:
func CatchPanic(funcName string, fields ...map[string]any) {
if err := recover(); err != nil {
stack := make([]byte, 8096)
stack = stack[:runtime.Stack(stack, false)]
logger := zlog.Logger.Log().Bytes("stack", stack).Str("level", "fatal").Interface("error", err)
if len(fields) > 0 {
logger.Fields(fields[0])
}
logger.Msgf("recovered from panic -%s", fName)
}
}
This function, CatchPanic, takes the name of the function where the panic occurred (funcName) and optional key-value fields for contextual logging. It recovers from the panic, logs the error and stack trace using zerolog, and ensures the application continues running.
Wrapping Goroutines Safely:
To protect your application, wrap each Goroutine with this helper function. Here’s an example:
go func() {
defer CatchPanic("myBackgroundJob")
// do some heavy lifting
}()
With this approach, any panic in the Goroutine will be logged, and the application will recover gracefully. You can use the logs to diagnose and fix the underlying issue.