Search Postgresql Archives

PG 9.3.12: Replication appears to have worked, but getting error messages in logs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

We're using streaming replication. Our technique for spinning up a db
slave is this:

rsync from master (gross copy)
pg_start_backup() on server
rsync from master (correct copy)
pg_stop_backup()
drop in recovery.conf into slave directory
enable hot_standby in slave conf
start slave

After starting the slave, I'm getting this error every 5 seconds in the log:

ERROR:  requested WAL segment 0000000100000E2200000005 has already been
removed

But I can connect to the DB and make queries and new records are
definitely streaming in.

I thought I just didn't have enough wal segments, so I bumped up the
number on the master and restarted the process. It just finished the
second time and the exact same error message is in the logs again (same
wal segment number).


When I ran pg_start_backup() and pg_stop_backup(), the output was:

 pg_start_backup
-----------------
 E27/3100A200

 pg_stop_backup
----------------
 E28/7D357950


The backup_label file looked like this:

START WAL LOCATION: E27/3100A200 (file 0000000100000E2700000031)
CHECKPOINT LOCATION: E27/31C9C740
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2016-04-02 12:34:25 PDT
LABEL: clone


During the rsync it copied
  pg_xlog/0000000100000E25000000F1
to
  pg_xlog/0000000100000E2800000071


So I'm confused: why is the E22 wal being requested? It seems to predate
the backup by a lot.

Does the slave really contain all the data? If not, how can I tell what
is missing (and why is it accepting streaming data if it's missing
something)?


One more piece of the puzzle that may or may not be relevant:

The current master used to be streaming replication slave. The original
master had a disk failure and so we switched one of the backup slaves
into a master. We've replaced the disk on the original server and we're
now trying to make it a streaming replication slave. This is the part
that's failing. If I do rough estimates of how fast the Exx number is
incrementing and compute backwards, E22 seems like about the time of the
original disk failure, give or take.


Thanks,
  David


<<attachment: smime.p7s>>


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux