Re: Some problem with warm standby server

Nico Sabbi <nsabbi@xxxxxxxxxxxxxxxxxxx> · Tue, 08 May 2007 18:05:41 +0200

Simon Riggs wrote:

then I updated the master with a batch of inserts, but after a while the 
slave stopped with
these messages:

LOG:  restored log file "000000010000000000000021" from archive
LOG:  record with zero length at 0/21000048
LOG:  invalid primary checkpoint record
LOG:  restored log file "000000010000000000000020" from archive
LOG:  restored log file "000000010000000000000021" from archive
LOG:  invalid resource manager ID in secondary checkpoint record
PANIC:  could not locate a valid checkpoint record
LOG:  startup process (PID 19619) was terminated by signal 6
LOG:  aborting startup due to startup process failure

Please run pg_controldata to print out the control file.

Hi, sorry for the long delay.
First of all I had to stop postgres with pg_ctl stop -s immediate, or it 
wouldn't die because of the ongoing replication.

This is the output of pg_controldata:

postgres@www3:/usr/local/postgres_replica/data$ pg_controldata   
/usr/local/postgres_replica/data/
pg_control version number:            812
Catalog version number:               200510211
Database system identifier:           5001030714849737714
Database cluster state:               in recovery
pg_control last modified:             Fri 27 Apr 2007 13:20:46 CEST
Current log file ID:                  0
Next log file segment:                26
Latest checkpoint location:           0/190C7E04
Prior checkpoint location:            0/190C7DC0
Latest checkpoint's REDO location:    0/190C7E04
Latest checkpoint's UNDO location:    0/0
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          3698809
Latest checkpoint's NextOID:          68745
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Time of latest checkpoint:            Fri 27 Apr 2007 11:53:47 CEST
Maximum data alignment:               4
Database block size:                  8192
Blocks per segment of large relation: 131072
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Date/time type storage:               floating-point numbers
Maximum length of locale name:        128
LC_COLLATE:                           C
LC_CTYPE:                             C

Backup all the files in case we need to inspect them.

ok

What was the ending log sequence number (e.g. x/xxxx) from the previous
recovery? I'll see if I can re-create this.

judging from the logs I gues it is 0/190C7E04:
LOG:  restored log file "000000010000000000000019.000C7E04.backup" from 
archive
LOG:  restored log file "000000010000000000000019" from archive
LOG:  checkpoint record is at 0/190C7E04
LOG:  redo record is at 0/190C7E04; undo record is at 0/0; shutdown FALSE
LOG:  next transaction ID: 3698809; next OID: 68745
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  automatic recovery in progress
LOG:  redo starts at 0/190C7E48

What did I do wrong? Is there any other procedure to follow to restart a 
stopped replication?

You're right, using the trigger is not the right way to stop/start the
standby. Just stop/start the standby server normally.

as above: a plain stop hangs

The trigger means that you'd like to perform a failover.

There is a patch not yet applied which will make a new version of
pg_standby. pg_standby's official status right now is beta, so please
expect, look for and report any issues you find. Thanks.

thank you