Re: Undetected Deadlock

Tom Lane <tgl@xxxxxxxxxxxxx> · Sun, 06 Feb 2022 17:57:10 -0500

Michael Harris <harmic@xxxxxxxxx> writes:
>> If Michael's analysis were accurate, I'd agree that there is a robustness
>> issue, but I don't think there is.  See timeout.c:220:

> Actually that only sets a new timer after the nearest timeout has expired.

Mmm, yeah, you're right: as long as we keep on canceling timeout
requests before they are reached, this code won't notice a problem.
We need to check against signal_due_at, not only the next timeout
request, more or less as attached.  Do you want to try this and see
if it actually adds any robustness with your buggy code?

> I was thinking that to improve robustness, we could add a check for
> `now < signal_due_at` to schedule_alarm line 300:

>         if (signal_pending && now < signal_due_at && nearest_timeout >= signal_due_at)
>             return;

I don't think that merging this issue into that test is appropriate;
the logic is already complicated and hard to explain.  Also, it seems
to me that some slop is needed, since as mentioned nearby, the kernel's
opinion of when to fire the interrupt is likely to be a bit later
than ours.  I'm not wedded to the 10ms slop proposed below, but it
seems like it's probably in the right ballpark.

> One other thing that struck me when reading this code: the variable
> signal_due_at is not declared as volatile sig_atomic_t, even though it
> is read from / written to by the signal handler. Maybe that could
> cause problems?

Hm.  I'm not really convinced there's any issue, since signal_due_at
is only examined when signal_pending is true.  Still, it's probably
better for all these variables to be volatile, so done.

> One other question: I've fixed our custom function, so that it
> correctly restores any interval timers that were running, but of
> course if our function takes longer than the remaining time on the
> postgres timer the signal will be delivered late. Beyond the fact that
> a deadlock or statement timeout will take longer than expected, are
> there any other negative consequences? Are any of the timeouts
> time-critical such that being delayed by a few seconds would cause a
> problem?

They're certainly not supposed to be that time-critical; any code that
is expecting such a thing is going to have lots of problems already.
PG is not a hard-real-time system ;-)

			regards, tom lane

diff --git a/src/backend/utils/misc/timeout.c b/src/backend/utils/misc/timeout.c
index ac57073033..f38c85df74 100644
--- a/src/backend/utils/misc/timeout.c
+++ b/src/backend/utils/misc/timeout.c
@@ -71,11 +71,12 @@ static volatile sig_atomic_t alarm_enabled = false;
 
 /*
  * State recording if and when we next expect the interrupt to fire.
+ * (signal_due_at is valid only when signal_pending is true.)
  * Note that the signal handler will unconditionally reset signal_pending to
  * false, so that can change asynchronously even when alarm_enabled is false.
  */
 static volatile sig_atomic_t signal_pending = false;
-static TimestampTz signal_due_at = 0;	/* valid only when signal_pending */
+static volatile TimestampTz signal_due_at = 0;
 
 
 /*****************************************************************************
@@ -217,10 +218,21 @@ schedule_alarm(TimestampTz now)
 
 		MemSet(&timeval, 0, sizeof(struct itimerval));
 
+		/*
+		 * If we think there's a signal pending, but current time is more than
+		 * 10ms past when the signal was due, then assume that the timeout
+		 * request got lost somehow and re-enable it.  (10ms corresponds to
+		 * the worst-case timeout granularity on modern systems.)  It won't
+		 * hurt us if the interrupt does manage to fire between now and when
+		 * we reach the setitimer() call.
+		 */
+		if (signal_pending && now > signal_due_at + 10 * 1000)
+			signal_pending = false;
+
 		/*
 		 * Get the time remaining till the nearest pending timeout.  If it is
 		 * negative, assume that we somehow missed an interrupt, and force
-		 * signal_pending off.  This gives us a chance to recover if the
+		 * signal_pending off.  This gives us another chance to recover if the
 		 * kernel drops a timeout request for some reason.
 		 */
 		nearest_timeout = active_timeouts[0]->fin_time;