Dmitry Samonenko wrote: > I have an application which uses libpq for interaction with remote PostgreSQL 9.2.4 server. Clients > and Server nodes are running Linux and connection is established using TCPv4. The client application > has some small fault-tolerance features, which are activated when server related problems are > encountered. > > One day some bad things happened with network layer hardware and, long story short, host with PSQL > server got isolated. All TCP messages routed to server node were NOT delivered or acknowledged in any > way. Client application got blocked in libpq code according to debugger. > > I have successfully reproduced the problem in the laboratory environment. These iptables commands > should be run on the server node after some period of client <-> server interaction: > > # iptables -A OUTPUT -p tcp --sport 5432 -j DROP > # iptables -A INPUT -p tcp --dport 5432 -j DROP > > > I made a glimpse over master branch of libpq sources and some questions arose. Namely: > > 1. Connection to PSQL server is made without an option to specify SO_RCVTIMEO and SO_SNDTIMEO. Why is > that? Is setting socket timeouts considered harmful? > > 2. PQexec ultimately leads to PQwait, which after some function calls "lands" in pqSocketCheck and > pqSocketPoll. These 2 functions have parameter end_time. It is set (-1) for PQexec scenario, which > leads to infinite poll timeout in pqSocketPoll. Is it possible to implement configurable timeout for > PQexec calls? Is there some implemented features, which should be used to handle situation like this? > > Currently, I have changed Linux kernel tcp4 stack counters responsible for retransmission, so OS > actually closes socket after some period. This is detected by pqSocketPoll's poll and libpq handles > situation correctly - error is reported to my application. But it's just a workaround. > > So, this infinite poll situation looks like imperfection to me and I think it should be considered as > a bug. Is it? In PostgreSQL you can handle the problem of dying connections by setting the tcp_keepalives_* parameters (see http://www.postgresql.org/docs/current/static/runtime-config-connection.html). That should take care of the problem, right? Yours, Laurenz Albe