On Thu, Mar 24, 2011 at 15:21, Merlin Moncure <mmoncure@xxxxxxxxx> wrote: > On Thu, Mar 24, 2011 at 9:07 AM, Erik Hesselink <hesselink@xxxxxxxxx> wrote: >> On Thu, Mar 24, 2011 at 14:23, Merlin Moncure <mmoncure@xxxxxxxxx> wrote: >>> On Thu, Mar 24, 2011 at 4:17 AM, Erik Hesselink <hesselink@xxxxxxxxx> wrote: >>>> Hi, >>>> >>>> We're getting a deadlock in our application (a web application with a >>>> PostgreSQL backend) which I've traced to libpq. I've started our >>>> application in gdb, and when it hangs, I've inspected the backtraces. >>>> I've found a couple of threads I can account for (listening for new >>>> connections, background processes) and 77 threads waiting for a mutex >>>> lock: >>>> >>>> #0 0x00007ffff523d464 in __lll_lock_wait () from /lib/libpthread.so.0 >>>> #1 0x00007ffff52385d9 in _L_lock_953 () from /lib/libpthread.so.0 >>>> #2 0x00007ffff52383fb in pthread_mutex_lock () from /lib/libpthread.so.0 >>>> #3 0x00007ffff6160650 in ?? () from /usr/lib/libpq.so.5 >>>> ==> pg_lockingcallback >>>> #4 0x00007ffff440b791 in ?? () from /lib/libcrypto.so.0.9.8 >>>> #5 0x00007ffff440bcc9 in ?? () from /lib/libcrypto.so.0.9.8 >>>> #6 0x00007ffff47652fb in SSL_new () from /lib/libssl.so.0.9.8 >>>> #7 0x00007ffff61604dc in ?? () from /usr/lib/libpq.so.5 >>>> ==> pqsecure_open_client >>>> #8 0x00007ffff61525ce in PQconnectPoll () from /usr/lib/libpq.so.5 >>>> #9 0x00007ffff6152f5e in ?? () from /usr/lib/libpq.so.5 >>>> ==> connectDBComplete >>>> #10 0x00007ffff6153c5f in PQconnectdb () from /usr/lib/libpq.so.5 >>>> #11 0x0000000000f9b518 in sccR_info () >>>> #12 0x0000000000000000 in ?? () >>>> >>>> So it seems everything is waiting for a lock on a mutex from >>>> pq_lockarray (in fe-secure.c@846). Does anybody have any idea how this >>>> can happen? Is this something we're doing wrong (I hope so) or a bug >>>> in libpq? >>>> >>>> Some background: this happens only after a couple of thousand requests >>>> (each doing about 15 database calls), with occasional other requests >>>> coming in at the same time. Our server uses a Haskell binding to libpq >>>> (HDBC [1] and HDBC-postgresql [2]). Both client and server run on the >>>> same machine, running 64bit Ubuntu 10.04. The database version is >>>> "PostgreSQL 8.4.7 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real >>>> (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit". I'm not sure how to determine >>>> the libpq version, but it is the most recent that comes with this >>>> ubuntu. The changelogs for Ubuntu suggest 8.4.7 as well. Connections >>>> are via TCP/IP to 127.0.0.1 with SSL turned on. The machine is under >>>> some CPU load when this happens. There is plenty of free memory. >>>> >>>> When I turned off SSL or connect via domain sockets, we got different >>>> errors that are possibly related: occasionally, the connection between >>>> client (our app) and server (database) is lost. On the client, we get: >>>> >>>> connectPostgreSQL: server closed the connection unexpectedly >>>> This probably means the server terminated abnormally >>>> before or while processing the request. >>>> >>>> and on the server: >>>> >>>> could not send data to client: Broken pipe >>>> >>>> There is no further context around these messages. >>>> >>>> Any help would be greatly appreciated. >>> >>> How did you initialize ssl? You are waiting inside a lock that is >>> getting set up inside the crypto library. Unless you are having some >>> type of library initialization issue, I'm suspicious the problem is >>> really inside libpq. Is your application multithreaded, and if so are >>> you properly synchronizing access to the connection object, etc? >> >> What do you mean exactly with "How did you initialize ssl"? I found >> [1], which I did not know about. This seems to be a very non-local >> problem: if one of our dependencies initializes ssl, and I use libpq >> as well, this will go wrong. I've done a quick look through all our >> dependencies, and none seem to use libcrypto or libssl. > > *something* must be initializing ssl, or you can't make secure > connections from libpq. you need to find out which pq ssl init > function is begin called, when it is being called, and with what > arguments. One of the main things PQInitSSL does is set up a lock > vector which it passes to the crypto library. The fact you are having > blocking issues around those locks is suggesting SSL was not set up > properly, something happened after being set up so that the locks are > no longer good, you have application thread issue (although that > sounds unlikely), or (least likely worst case) there is a bug in > crypto. >From the postgresql documentation I linked to in my last post, it seems that if I do not call PQinitOpenSSL and I do not initialize the libraries myself, libpq will do it for me. Is that correct? If so, then that is what is happening in my case. Regards, -- Erik Hesselink http://silkapp.com -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general