On Wed, Mar 04, 2009 at 03:14:51PM -0500, Ray Stell wrote: > On Wed, Mar 04, 2009 at 03:06:12PM -0500, Ray Stell wrote: > > Testing pg_standby in 8.3.6. I've gotten this standby into some sort of > > bind. It seems like it may be waiting for some WAL. How can I tell > > what it is waiting on? I don't really know how this works, so I may > > > say something silly. The standby log says: > > ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,1,2009-03-04 12:23:01 EST,0, LOG: database system was interrupted; last known up at 2009-03-04 12:20:29 EST > ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,2,2009-03-04 12:23:01 EST,0, LOG: starting archive recovery > ,2512,,2009-03-04 12:23:01.484 EST,49aeb8f5.9d0,3,2009-03-04 12:23:01 EST,0, LOG: restore_command = '/usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp %f %p %r >> /home/postgresql/log/alerts_oamp/recovery.log' > > > alerts_oamp]$ cat postmaster.pid > 2510 > /data/pgsql/alerts_oamp > 5498001 4194312 > > alerts_oamp]$ ps -ef | grep 1005 > 1005 903 901 0 10:10 ? 00:00:00 sshd: postgresql@pts/0 > 1005 904 903 0 10:10 pts/0 00:00:00 -bash > 1005 1016 1013 0 10:21 ? 00:00:00 sshd: postgresql@pts/1 > 1005 1017 1016 0 10:21 pts/1 00:00:00 -bash > 1005 2510 1 0 12:23 pts/0 00:00:00 /usr/local/pgsql836/bin/postgres -D /data/pgsql/alerts_oamp > 1005 2511 2510 0 12:23 ? 00:00:00 postgres: logger process > 1005 2512 2510 0 12:23 ? 00:00:00 postgres: startup process > 1005 2520 2512 0 12:23 ? 00:00:00 sh -c /usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp 00000002000000000000001C.00512178.backup pg_xlog/RECOVERYHISTORY 000000000000000000000000 >> /home/postgresql/log/alerts_oamp/recovery.log > 1005 2521 2520 0 12:23 ? 00:00:00 /usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp 00000002000000000000001C.00512178.backup pg_xlog/RECOVERYHISTORY 000000000000000000000000 > 1005 2615 1017 0 12:27 pts/1 00:00:00 tail -f alerts_oamp-2009-03-04_122301.log > 1005 3271 904 0 15:11 pts/0 00:00:00 ps -ef > 1005 3272 904 0 15:11 pts/0 00:00:00 grep 1005 > > alerts_oamp]$ ls -l /data/pgsql/wals/alerts_oamp/ > total 114828 > -rw------- 1 postgresql postgresql 16777216 Mar 4 11:28 00000002000000000000001A > -rw------- 1 postgresql postgresql 16777216 Mar 4 11:29 00000002000000000000001B > -rw------- 1 postgresql postgresql 16777216 Mar 4 12:24 00000002000000000000001C > -rw------- 1 postgresql postgresql 16777216 Mar 4 12:25 00000002000000000000001D > -rw------- 1 postgresql postgresql 16777216 Mar 4 12:26 00000002000000000000001E > -rw------- 1 postgresql postgresql 16777216 Mar 4 14:45 00000002000000000000001F > -rw------- 1 postgresql postgresql 16777216 Mar 4 14:45 000000020000000000000020 > > any ideas what this guy is hurt by? I stubbled into the source of the problem. I hope somebody who knows the code can explain. I decided to bounce the primary just to see if it would make a difference in the standby. The primary would not restart: ,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,2,2009-03-06 10:34:01 EST,0, LOG: could not open file "pg_xlog/00000002000000000000001C" (log file 0, segment 28): No such file or directory ,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,3,2009-03-06 10:34:01 EST,0, LOG: invalid checkpoint record ,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,4,2009-03-06 10:34:01 EST,0, PANIC: could not locate required checkpoint record ,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,5,2009-03-06 10:34:01 EST,0, HINT: If you are not restoring from a backup, try removing the file "/data/pgsql/alerts_oamp/backup_label". ,3093,,2009-03-06 10:34:01.910 EST,49b14269.c15,1,2009-03-06 10:34:01 EST,0, LOG: startup process (PID 3095) was terminated by signal 6: Aborted So, I removed that file and restarted. Rebuilt the standby and all is well. So, why did that file muck up the standby and change the value pg was passing to pg_standby? Thanks, looking forward to 8.4! -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin