Problems with PostgreSQL Replication (Log Shipping)

JotaComm <jota.comm@xxxxxxxxx> · Tue, 12 Mar 2013 10:21:07 -0300

Hello, everybody

I have one problem and I need some help.

My environment: one master and one slave (PostgreSQL 9.2.2).

My cluster has about 160GB and pg_basebackup to syncronize them (master and slave).

The sintax is below:

pg_basebackup -h productionaddress -p productionport -U productionuser -D datadirectory -P -v

My recovery.conf:

standby_mode = 'on'

primary_conninfo = 'host=productionaddress port=productionport user=productionuser'

archive_cleanup_command = 'pg_archivecleanup /slave/transactionlogs %r'

My postgresql.conf: (master)

wal_level = hot_standby

checkpoint_segments = 10

archive_mode = on

archive_command = 'rsync -Crap %p postgres@slaveaddress:/slave/transactionlogs/%f'

max_wal_senders = 1

wal_keep_segments = 50

My postgresql.conf: (slave)

checkpoint_segments = 10

hot_standby = on

In my slave I have the following erros: My first attempt

2013-03-07 15:58:21 BRT [11817]: [1-1] user=,db= LOG:  database system was interrupted; last known up at 2013-03-07 15:55:43 BRT

2013-03-07 15:58:21 BRT [11817]: [2-1] user=,db= LOG:  entering standby mode
2013-03-07 15:58:21 BRT [11818]: [1-1] user=,db= LOG:  streaming replication successfully connected to primary
2013-03-07 15:58:25 BRT [11817]: [3-1] user=,db= LOG:  consistent recovery state reached at 141/8FBB5F0

2013-03-07 15:58:25 BRT [11817]: [4-1] user=,db= LOG:  redo starts at 141/2251F90
2013-03-07 15:58:25 BRT [11817]: [5-1] user=,db= FATAL:  could not access status of transaction 30622931
2013-03-07 15:58:25 BRT [11817]: [6-1] user=,db= DETAIL:  Could not read from file "pg_clog/001D" at offset 49152: Success.

2013-03-07 15:58:25 BRT [11817]: [7-1] user=,db= CONTEXT:  xlog redo commit: 2013-03-07 15:55:40.673623-03
2013-03-07 15:58:25 BRT [11767]: [1-1] user=,db= LOG:  startup process (PID 11817) exited with exit code 1
2013-03-07 15:58:25 BRT [11767]: [2-1] user=,db= LOG:  terminating any other active server processes

In my slave I have the following erros: My second attempt

2013-03-11 12:07:49 BRT [5862]: [1-1] user=,db= LOG:  database system was interrupted; last known up at 2013-03-11 12:06:31 BRT
2013-03-11 12:07:49 BRT [5862]: [2-1] user=,db= LOG:  entering standby mode

2013-03-11 12:07:49 BRT [5864]: [1-1] user=,db= LOG:  streaming replication successfully connected to primary
2013-03-11 12:07:53 BRT [5862]: [3-1] user=,db= LOG:  consistent recovery state reached at 168/816AE10
2013-03-11 12:07:53 BRT [5862]: [4-1] user=,db= LOG:  redo starts at 167/FEC3D828

2013-03-11 12:07:53 BRT [5862]: [5-1] user=,db= FATAL:  could not access status of transaction 36529670
2013-03-11 12:07:53 BRT [5862]: [6-1] user=,db= DETAIL:  Could not read from file "pg_clog/0022" at offset 212992: Success.

2013-03-11 12:07:53 BRT [5862]: [7-1] user=,db= CONTEXT:  xlog redo commit: 2013-03-11 12:05:35.069759-03
2013-03-11 12:07:53 BRT [5762]: [1-1] user=,db= LOG:  startup process (PID 5862) exited with exit code 1
2013-03-11 12:07:53 BRT [5762]: [2-1] user=,db= LOG:  terminating any other active server processes

I had the same problem, but in different files (pg_clog):

First attempt:

2013-03-07 15:58:25 BRT [11817]: [5-1] user=,db= FATAL:  could not access status of transaction 30622931
2013-03-07 15:58:25 BRT [11817]: [6-1] user=,db= DETAIL:  Could not read from file "pg_clog/001D" at offset 49152: Success.

Second attempt:

2013-03-11 12:07:53 BRT [5862]: [5-1] user=,db= FATAL:  could not access status of transaction 36529670
2013-03-11 12:07:53 BRT [5862]: [6-1] user=,db= DETAIL:  Could not read from file "pg_clog/0022" at offset 212992: Success.

When I created the cluster and I did this test, it was OK. Now, my cluster has about 160GB and I tried starting the replication and I have this problems.

Some idea?

Thank you.

Best Regards

João Paulo

-- 
JotaComm
http://jotacomm.wordpress.com