On Monday 01 March 2010 @ 16:03, Ed L. wrote: > On Monday 01 March 2010 @ 15:59, Ed L. wrote: > > > This just happened again ~24 hours after full reload from > > > backup. Arrrgh. > > > > > > Backtrace looks the same again, same file, same > > > __read_nocancel(). $PGDATA/global/pg_auth looks fine to > > > me, permissions are 600, entries are 3 or more > > > double-quoted items per line each separated by a space, > > > items 3 and beyond being groups. > > > > > > Any clues? > > Also seeing lots of postmaster zombies (190 and growing)... While new connections are hanging, top shows postmaster using 100% of cpu. SIGTERM/SIGQUIT do nothing. Here's a backtrace of this busy postmaster: (gdb) bt #0 0x000000346f8c43a0 in __read_nocancel () from /lib64/libc.so.6 #1 0x000000346f86c747 in _IO_new_file_underflow () from /lib64/libc.so.6 #2 0x000000346f86d10e in _IO_default_uflow_internal () from /lib64/libc.so.6 #3 0x000000346f8689cb in getc () from /lib64/libc.so.6 #4 0x0000000000531ee8 in next_token (fp=0x10377ae0, buf=0x7fff32230e60 "", bufsz=4096) at hba.c:128 #5 0x0000000000532233 in tokenize_file (filename=0x10359b70 "global", file=0x10377ae0, lines=0x7fff322310f8, line_nums=0x7fff322310f0) at hba.c:232 #6 0x00000000005322e9 in tokenize_file (filename=0x2b1c8cbf5800 "global/pg_auth", file=0x103767a0, lines=0x98b168, line_nums=0x98b170) at hba.c:358 #7 0x00000000005327ff in load_role () at hba.c:959 #8 0x000000000057f878 in sigusr1_handler (postgres_signal_arg=<value optimized out>) at postmaster.c:3830 #9 <signal handler called> #10 0x000000346f8cb323 in __select_nocancel () from /lib64/libc.so.6 #11 0x000000000057cc33 in ServerLoop () at postmaster.c:1236 #12 0x000000000057dfdf in PostmasterMain (argc=6, argv=0x1033f000) at postmaster.c:1031 #13 0x00000000005373de in main (argc=6, argv=<value optimized out>) at main.c:188 ...and more from the server logs, fwiw: 2010-03-01 17:30:24.213 CST [32238] WARNING: worker took too long to start; cancelled 2010-03-01 17:30:31.250 CST [32236] DEBUG: transaction log switch forced (archive_timeout=300) 2010-03-01 17:31:24.216 CST [32238] WARNING: worker took too long to start; cancelled 2010-03-01 17:32:24.219 CST [32238] WARNING: worker took too long to start; cancelled 2010-03-01 17:33:24.222 CST [32238] WARNING: worker took too long to start; cancelled 2010-03-01 17:34:24.225 CST [32238] WARNING: worker took too long to start; cancelled 2010-03-01 17:35:19.061 CST [32236] LOG: checkpoint starting: time 2010-03-01 17:35:19.185 CST [32236] DEBUG: recycled transaction log file "000000010000001C00000071" 2010-03-01 17:35:19.185 CST [32236] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=0.028 s, sync=0.000 s, total=0.124 s 2010-03-01 17:35:24.328 CST [32238] WARNING: worker took too long to start; cancelled 2010-03-01 17:35:31.224 CST [32236] DEBUG: transaction log switch forced (archive_timeout=300) 2010-03-01 17:36:44.332 CST [32238] WARNING: worker took too long to start; cancelled 2010-03-01 17:37:44.434 CST [32238] WARNING: worker took too long to start; cancelled 2010-03-01 17:37:47.378 CST [3692] dba 10....(42816) dba LOG: could not receive data from client: Connection timed out 2010-03-01 17:37:47.378 CST [3692] dba 10....(42816) dba LOG: unexpected EOF on client connection 2010-03-01 17:37:47.380 CST [3692] dba 10....(42816) dba LOG: disconnection: session time: 2:11:15.303 user=dba database=dba host=... port=428 -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general