Re: standby waiting for what?

Ray Stell <stellr@xxxxxxxxxx> · Fri, 6 Mar 2009 11:19:08 -0500

On Wed, Mar 04, 2009 at 03:14:51PM -0500, Ray Stell wrote:
> On Wed, Mar 04, 2009 at 03:06:12PM -0500, Ray Stell wrote:
> > Testing pg_standby in 8.3.6.  I've gotten this standby into some sort of 
> > bind.  It seems like it may be waiting for some WAL.   How can I tell
> > what it is waiting on?  I don't really know how this works, so I may 
> 
> 
> say something silly.  The standby log says:
> 
> ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,1,2009-03-04 12:23:01 EST,0, LOG:  database system was interrupted; last known up at 2009-03-04 12:20:29 EST
> ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,2,2009-03-04 12:23:01 EST,0, LOG:  starting archive recovery
> ,2512,,2009-03-04 12:23:01.484 EST,49aeb8f5.9d0,3,2009-03-04 12:23:01 EST,0, LOG:  restore_command = '/usr/local/pgsql/bin/pg_standby  /data/pgsql/wals/alerts_oamp %f %p %r >> /home/postgresql/log/alerts_oamp/recovery.log'
> 
> 
> alerts_oamp]$ cat postmaster.pid 
> 2510
> /data/pgsql/alerts_oamp
>   5498001   4194312
> 
> alerts_oamp]$ ps -ef | grep 1005
> 1005       903   901  0 10:10 ?        00:00:00 sshd: postgresql@pts/0
> 1005       904   903  0 10:10 pts/0    00:00:00 -bash
> 1005      1016  1013  0 10:21 ?        00:00:00 sshd: postgresql@pts/1
> 1005      1017  1016  0 10:21 pts/1    00:00:00 -bash
> 1005      2510     1  0 12:23 pts/0    00:00:00 /usr/local/pgsql836/bin/postgres -D /data/pgsql/alerts_oamp
> 1005      2511  2510  0 12:23 ?        00:00:00 postgres: logger process                                   
> 1005      2512  2510  0 12:23 ?        00:00:00 postgres: startup process                                  
> 1005      2520  2512  0 12:23 ?        00:00:00 sh -c /usr/local/pgsql/bin/pg_standby  /data/pgsql/wals/alerts_oamp 00000002000000000000001C.00512178.backup pg_xlog/RECOVERYHISTORY 000000000000000000000000 >> /home/postgresql/log/alerts_oamp/recovery.log
> 1005      2521  2520  0 12:23 ?        00:00:00 /usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp 00000002000000000000001C.00512178.backup pg_xlog/RECOVERYHISTORY 000000000000000000000000
> 1005      2615  1017  0 12:27 pts/1    00:00:00 tail -f alerts_oamp-2009-03-04_122301.log
> 1005      3271   904  0 15:11 pts/0    00:00:00 ps -ef
> 1005      3272   904  0 15:11 pts/0    00:00:00 grep 1005
> 
> alerts_oamp]$ ls -l /data/pgsql/wals/alerts_oamp/
> total 114828
> -rw------- 1 postgresql postgresql 16777216 Mar  4 11:28 00000002000000000000001A
> -rw------- 1 postgresql postgresql 16777216 Mar  4 11:29 00000002000000000000001B
> -rw------- 1 postgresql postgresql 16777216 Mar  4 12:24 00000002000000000000001C
> -rw------- 1 postgresql postgresql 16777216 Mar  4 12:25 00000002000000000000001D
> -rw------- 1 postgresql postgresql 16777216 Mar  4 12:26 00000002000000000000001E
> -rw------- 1 postgresql postgresql 16777216 Mar  4 14:45 00000002000000000000001F
> -rw------- 1 postgresql postgresql 16777216 Mar  4 14:45 000000020000000000000020
> 
> any ideas what this guy is hurt by?

I stubbled into the source of the problem.  I hope somebody who knows the code can explain.
I decided to bounce the primary just to see if it would make a difference in the standby.
The primary would not restart:

,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,2,2009-03-06 10:34:01 EST,0, LOG:  could not open file "pg_xlog/00000002000000000000001C" (log file 0, segment 28): No such file or directory
,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,3,2009-03-06 10:34:01 EST,0, LOG:  invalid checkpoint record
,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,4,2009-03-06 10:34:01 EST,0, PANIC:  could not locate required checkpoint record
,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,5,2009-03-06 10:34:01 EST,0, HINT:  If you are not restoring from a backup, try removing the file "/data/pgsql/alerts_oamp/backup_label".
,3093,,2009-03-06 10:34:01.910 EST,49b14269.c15,1,2009-03-06 10:34:01 EST,0, LOG:  startup process (PID 3095) was terminated by signal 6: Aborted

So, I removed that file and restarted.  Rebuilt the standby and all is well.  So, why did that file muck up the standby and
change the value pg was passing to pg_standby?  

Thanks, looking forward to 8.4!

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin