On 2015-12-01 20:55:02 +0100, Peter J. Holzer wrote: > On 2015-12-01 18:58:31 +0100, Peter J. Holzer wrote: > > I suspect such an interaction because I cannot reproduce the problem > > outside of a stored procedure. A standalone Perl script doing the same > > requests doesn't get a timeout. [...] > The strace doesn't show a reason for the SIGALRM, though. No alarm(2) or > setitimer(2) system call (I connected strace to a running postgres > process just after I got the prompt from "psql" and before I typed > "select * from mb_search('export');" (I used a different (but very > similar) stored procedure for those tests because it is much easier to > find a search which is slow enough to trigger a timeout at least > sometimes than a data request (which normally finishes in > milliseconds)). > > So I guess my next task will be to find out where that SIGALRM comes > from and/or whether I can just restart the zmq_msg_recv if it happens. Ok, I think I know where that SIGALRM comes from: It's the AuthenticationTimeout. What I'm seeing in strace (if I attach it early enough) is that during authentication the postgres worker process calls setitimer with a 60 second timeout twice. This matches the comment in backend/postmaster/postmaster.c: * Note: AuthenticationTimeout is applied here while waiting for the * startup packet, and then again in InitPostgres for the duration of any * authentication operations. So a hostile client could tie up the * process for nearly twice AuthenticationTimeout before we kick him off. As explained in backend/utils/misc/timeout.c, the timers are never cancelled: If a timeout is cancelled, postgres just sees that it has nothing to do and resumes whatever it is doing. This is also what I'm seeing: 60 seconds after start, the process receives a SIGALRM. If the process is idle or in a "normal" SQL statement at the time, thats not a problem. But if it is in one of my stored procedures which is currently calling a ØMQ function which is waiting for some I/O (zmq_msg_recv(), most likely), that gets interrupted and returns an error which my code doesn't know how to handle (yet). So the error gets back to the user. A strange interaction between postgres and ØMQ indeed. But now that I know what's causing it I can handle that. Thanks for your patience. hp -- _ | Peter J. Holzer | I want to forget all about both belts and |_|_) | | suspenders; instead, I want to buy pants | | | hjp@xxxxxx | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/
Attachment:
signature.asc
Description: Digital signature