On Sun 2019-09-01 19:46:19 -0400, Tom Lane wrote: > I poked at this for awhile and concluded that what you must be seeing is > that the statement timeout interrupt triggers, but no CHECK_FOR_INTERRUPTS > call happens thereafter, until we get to the disable_statement_timeout() > call in finish_xact_command(). That throws away the timeout-happened > indicator, but it does nothing to reset InterruptPending or > QueryCancelPending. Those flags will also be thrown away eventually, > after we've read a new client command: > > /* > * (4) disable async signal conditions again. > * > * Query cancel is supposed to be a no-op when there is no query in > * progress, so if a query cancel arrived while we were idle, just > * reset QueryCancelPending. ProcessInterrupts() has that effect when > * it's called when DoingCommandRead is set, so check for interrupts > * before resetting DoingCommandRead. > */ > CHECK_FOR_INTERRUPTS(); > DoingCommandRead = false; > > Ignoring the timeout if it happened too late is what we're supposed > to be doing, so that's fine so far. > > But: what happens if some other CHECK_FOR_INTERRUPTS occurs before we > reach the main input loop? Of course, exactly the symptom you saw: > a bogus "canceling statement due to user request" error report. > What's worse, that happens after we've completed the transaction, leaving > the client with a very misleading idea of whether the command committed > or not. > > So how could that happen? After experimenting, I found a pretty > obvious candidate: statement duration logging happens after we > do finish_xact_command, and there's a CHECK_FOR_INTERRUPTS in > errfinish(). So I hypothesize that you have log_duration or > log_min_duration_statement enabled. Yes, I have log_min_duration_statement set to 0! (log_duration is off) > I was able to replicate the problem by lobotomizing a couple of > CHECK_FOR_INTERRUPTS that would normally happen in a simple > "select pg_sleep()" query (see attached for-demonstration- > purposes-only patch). Then I get > > regression=# set statement_timeout TO '1s'; > SET > regression=# select pg_sleep(2.0); -- completes normally after 2 sec > pg_sleep > ---------- > > (1 row) > > regression=# set log_min_duration_statement TO '100ms'; > SET > regression=# select pg_sleep(2.0); -- sits for 2 sec, then: > ERROR: canceling statement due to user request > > So that's not very good and we ought to do something about it. > We can't have disable_statement_timeout() clear QueryCancelPending, > because that might lose state about some other cause for query > cancel. I'm inclined to think that we need to split QueryCancelPending > into more than one flag. > > A separate question is how come the particular query you're complaining > about has (seemingly) a fairly wide window where it never does any > CHECK_FOR_INTERRUPTS call before terminating. Perhaps there's someplace > we need to sprinkle a few more of those. Yeah, it is strange. There are many queries in this system with statement timeouts and this is the only one where I see this and it happens several times a day. Thank you so much for looking into this. An impressive analysis!