On Thu, Mar 30, 2006 at 11:14:37AM -0500, Tom Lane wrote: > Steve Linabery <slinabery@xxxxxxxxxxxxxxxx> writes: > > Last night during backup, I noticed that I could not get a connection, either remotely via jdbc or locally via psql. Well perhaps that is inaccurate: 'ps -ef' showed several processes with a status message of "waiting to startup" (or something similar; sorry, I was late-night coding and didn't think to write it down). > > Hm, I don't recall any such status message in the code. Can you double > check what it said exactly? I just replicated the problem, see below for explanation, but here is ps output: postgres 3201 1 0 Mar13 ? 01:58:14 /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data postgres 3270 3201 0 Mar13 ? 00:04:11 postgres: writer process postgres 3271 3201 0 Mar13 ? 00:11:28 postgres: stats buffer process postgres 3272 3271 0 Mar13 ? 00:07:11 postgres: stats collector process postgres 3360 3201 0 Mar29 ? 00:00:00 postgres: dbusername other_db_name obfus.ip.address.36(53524) idle postgres 3363 3201 0 Mar29 ? 00:00:00 postgres: dbusername other_db_name obfus.ip.address.36(53536) idle postgres 3364 3201 0 Mar29 ? 00:00:00 postgres: dbusername other_db_name obfus.ip.address.36(53537) idle root 16467 16447 0 09:59 pts/0 00:00:00 su postgres postgres 16468 16467 0 09:59 pts/0 00:00:00 bash postgres 25817 3201 55 10:20 ? 00:02:54 postgres: postgres db_being_dumped [local] COPY postgres 27956 3201 46 10:24 ? 00:00:36 postgres: postgres db_being_dumped [local] COPY postgres 28124 3201 0 10:25 ? 00:00:00 postgres: dbusername template1 obfus.ip.address.36(49528) DROP DATABASE waiting postgres 28180 3201 0 10:25 ? 00:00:00 postgres: dbusername db_being_dumped obfus.ip.address.37(48997) startup waiting postgres 28183 3201 0 10:25 ? 00:00:00 postgres: dbusername db_being_dumped obfus.ip.address.36(49532) startup waiting > > What do you mean by "could not get a connection" ... did it fail (if so, > with what client-side error message) or just hang up waiting? > All connection attempts just hung. No error message on the client side. Here's what I did to replicate the problem: 0) first tried running pg_dump and there were no problems with connecting (from various web applications, mail server, etc) 1) repeated what I did last night: am developing a java webapp with OJB, which uses torque & xdoclet to generate SQL for the various classes in the project. Part of the torque xdoclet task DROPS the database and recreates it (only while you're in development mode...still have to figure out how to turn that off!). 2) postgresql hung once I ran the ant task and torque was trying to drop the database, as you can see from the ps output above. So, what I'm gathering is that dropping a database counts as one of the operations that will hang, along with 'VACUUM FULL', if pg_dump is running. Correct? I ran across a reference to 'VACUUM FULL' in the docs w.r.t. pg_dump in the context of troubleshooting this problem last night, but can't find it now. Thanks! -- Steve Linabery, sysadmin/developer B94B C3C7 8A27 FF09 3C9D E992 5A20 2492 D5F5 EE51 This electronic message transmission contains information from the sender's organization that may be proprietary, confidential and/or privileged. The information is intended only for the use of the individual(s) or entity named above. If you are not the intended recipient, be aware that any disclosure, copying or distribution or use of the contents of this information is prohibited. If you have received this electronic transmission in error, please notify the sender immediately by replying to the address listed in the "From:"