A few weeks ago we had an incident where our RabbitMQ was blocked and no new messages were processed. Restarting the service temporary resolved the issue, but the underlying the problem was still there. It was caused by a mutex that stayed locked after a panic. Even though we did recover from panic - the mutex remained to be locked.
We had a function that was sorounded by Mutex Lock/Unlock as it was writing to a map that was accessed concurrently. The function was quite simple and it wasn’t supposed to panic, but it did. Luckily - all the places are covered with panic handlers (recover()). Even though the panic was properly recovered from, we noticed that the issue with queue being blocked started happening after that particular panic.
A quick inspection at that particular code lead to a realization that the reason behind this was that the mutex unlock wasn’t deferred. What’s the catch?
defer func() {
if err := recover(); err != nil {
fmt.Println("Recovered")
}
}()
mutex.Lock()
functionCallThatPanics()
mutex.Unlock()
The problem with above function is that in case it panics (and for us it did), the m.Unlock()
was never executed. For us it was a bit more complex as that function call was done in a loop, so it had to be extracted, as defer calls aren’t suited for loops.
Apart from fixing the actual panic, the issue was resolved by adding a defer call to mutex.Unlock()
defer func() {
if err := recover(); err != nil {
fmt.Println("Recovered")
}
}()
m.Lock()
defer m.Unlock()
functionCallThatPanics()
This way, the mutex is unlocked properly even in case of a panic. The image below demonstrates the difference using fmt.Println().