> If Michael's analysis were accurate, I'd agree that there is a robustness > issue, but I don't think there is. See timeout.c:220: Actually that only sets a new timer after the nearest timeout has expired. The pattern I was seeing went like this: 1. Command occurs during which a signal was not delivered (due to our custom function). signal_pending is set and never cleared. 2. This command finishes normally, so the deadlock timeout is removed. As long as there are either no more active timeouts, or the nearest active timeout is still in the future, then no interval timer is set. 3. A later command starts, and again because the new timeout is in the future, no new interval timer is set. 4. This transaction gets into a deadlock, but because the interval timer is not running, ALRM is never received so the process is stuck. In fact once a backend gets into this state it is permanently stuck like this because new calls to schedule timeouts are always scheduling them for a future time. I was thinking that to improve robustness, we could add a check for `now < signal_due_at` to schedule_alarm line 300: if (signal_pending && now < signal_due_at && nearest_timeout >= signal_due_at) return; That way, even if we think we are due for a signal, if we are already past the time we were expecting it we go ahead and set a new one anyway. Even though the fault in this case was a faulty custom function, there could be other scenarios in which a signal isn't delivered (that bit of code Tom highlighted is specifically to cater for that, going by the comments). One other thing that struck me when reading this code: the variable signal_due_at is not declared as volatile sig_atomic_t, even though it is read from / written to by the signal handler. Maybe that could cause problems? One other question: I've fixed our custom function, so that it correctly restores any interval timers that were running, but of course if our function takes longer than the remaining time on the postgres timer the signal will be delivered late. Beyond the fact that a deadlock or statement timeout will take longer than expected, are there any other negative consequences? Are any of the timeouts time-critical such that being delayed by a few seconds would cause a problem? Thanks for any feedback. Regards Mike