Hi all,
this is somewhat involved so please bear with me.
We've found a situation where canceling a query may cause the client to
hang, possibly indefinitely. This can happen if the network connection
fails in a specific way.
The reason for this lies in the way the PQcancel function (which
eventually gets called from the higher level interface's cancel
function) is implemented. It works by opening a second connection to
the postmaster (on the same host/port as the existing connection),
send()-ing a cancellation message via the newly opened connection, then
calling recv() to receive an indication that the message was processed.
However, if the network fails in a way that the connection appears to
have been established but subsequent packages are dropped silently,
this recv() call will block.
My questions:
Is this known?
Is this a bug?
What can be done to fix or work around it, apart from applying a
timeout wrapper the cancel operation as well?
It does sound familiar. Providing the version number(s) on which you encountered this behavior would be helpful. Or HEAD if you have or are testing against current code.
David J.