Hi, We're getting a deadlock in our application (a web application with a PostgreSQL backend) which I've traced to libpq. I've started our application in gdb, and when it hangs, I've inspected the backtraces. I've found a couple of threads I can account for (listening for new connections, background processes) and 77 threads waiting for a mutex lock: #0 0x00007ffff523d464 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0x00007ffff52385d9 in _L_lock_953 () from /lib/libpthread.so.0 #2 0x00007ffff52383fb in pthread_mutex_lock () from /lib/libpthread.so.0 #3 0x00007ffff6160650 in ?? () from /usr/lib/libpq.so.5 ==> pg_lockingcallback #4 0x00007ffff440b791 in ?? () from /lib/libcrypto.so.0.9.8 #5 0x00007ffff440bcc9 in ?? () from /lib/libcrypto.so.0.9.8 #6 0x00007ffff47652fb in SSL_new () from /lib/libssl.so.0.9.8 #7 0x00007ffff61604dc in ?? () from /usr/lib/libpq.so.5 ==> pqsecure_open_client #8 0x00007ffff61525ce in PQconnectPoll () from /usr/lib/libpq.so.5 #9 0x00007ffff6152f5e in ?? () from /usr/lib/libpq.so.5 ==> connectDBComplete #10 0x00007ffff6153c5f in PQconnectdb () from /usr/lib/libpq.so.5 #11 0x0000000000f9b518 in sccR_info () #12 0x0000000000000000 in ?? () So it seems everything is waiting for a lock on a mutex from pq_lockarray (in fe-secure.c@846). Does anybody have any idea how this can happen? Is this something we're doing wrong (I hope so) or a bug in libpq? Some background: this happens only after a couple of thousand requests (each doing about 15 database calls), with occasional other requests coming in at the same time. Our server uses a Haskell binding to libpq (HDBC [1] and HDBC-postgresql [2]). Both client and server run on the same machine, running 64bit Ubuntu 10.04. The database version is "PostgreSQL 8.4.7 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit". I'm not sure how to determine the libpq version, but it is the most recent that comes with this ubuntu. The changelogs for Ubuntu suggest 8.4.7 as well. Connections are via TCP/IP to 127.0.0.1 with SSL turned on. The machine is under some CPU load when this happens. There is plenty of free memory. When I turned off SSL or connect via domain sockets, we got different errors that are possibly related: occasionally, the connection between client (our app) and server (database) is lost. On the client, we get: connectPostgreSQL: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. and on the server: could not send data to client: Broken pipe There is no further context around these messages. Any help would be greatly appreciated. Sincerely, -- Erik Hesselink http://silkapp.com [1] http://hackage.haskell.org/package/HDBC [2] http://hackage.haskell.org/package/HDBC-postgresql -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general