Excerpts from Pablo Delgado DÃaz-Pache's message of jue nov 18 08:57:16 -0300 2010: > 2) We did a strace to the postmaster pid. However we had 2 postmasters not > dead > > # ps -fea |grep -i postmaster > postgres 3889 1 0 Nov16 ? 00:01:24 /usr/bin/postmaster -p 5432 > -D /var/lib/pgsql/data > postgres 7601 3889 0 12:37 ? 00:00:00 /usr/bin/postmaster -p 5432 > -D /var/lib/pgsql/data > > As soon as we did a "strace" to the 3889 pid everything started to work > again. Sorry for my previous response -- evidently I failed to scroll down enough to notice this part. It seems to me that this process was stuck in a unnatural way. > Not sure it was a coincidence but that was the way it was. > > *# strace -p 3889* > *Process 3889 attached - interrupt to quit* > *select(6, [3 4 5], NULL, NULL, {56, 930000}) = ? ERESTARTNOHAND (To be > restarted)* > *--- SIGUSR1 (User defined signal 1) @ 0 (0) ---* > *rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN > RT_1], NULL, 8) = 0* This seems normal postmaster activity: receiving SIGUSR1, then SIGCHLD, and doing stuff accordingly. Rather than a coincidence, I would think that the act of tracing it made it come back to life. A kernel bug maybe? Have you upgraded your kernel or libc lately? > I also straced the other postmaster pid > > *# strace -p 7601* > *Process 7601 attached - interrupt to quit* > *recvfrom(8, "P\0\0\0\221\0select id_key from transla"..., 8192, 0, NULL, > NULL) = 181* This one seems like a regular postmaster child that hadn't gotten around to changing its ps status yet. (Note it had PPID 3889 which is consistent with this idea.) -- Ãlvaro Herrera <alvherre@xxxxxxxxxxxxxxxxx> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin