Hi Andres,
thank you very much for your response.
Full details on our postgresql version is:
Name : postgresql-server
Arch : x86_64
Version : 9.2.13
Release : 1.el7_1
Size : 3.8 M
Repo : rhel-7-server-dvd-rpms
Summary : The programs needed to create and run a PostgreSQL server
License : PostgreSQL
Description : PostgreSQL is an advanced Object-Relational database management system (DBMS).
: The postgresql-server package contains the programs needed to create
: and run a PostgreSQL server, which will in turn allow you to create
: and maintain PostgreSQL databases.
We generally promote a slave to master as a reaction to the master being determined dead by our healthchecker. This is not something that happens often "in the wild" but we have a substantial amount of automated and manually operated QA setups where these situations are provoked. In order to promote, we just touch the trigger file. Typically, our monitoring process will determine that the new master is ready, but that there is no slave available to stream to. At this point, the monitoring process turns off sync replication (by settings synchronous_standby_names to 'none') in order for the new master to be workable. At some point later, the old master becomes recycled and joins as a slave. At this point, the monitoring process will turn on sync replication again (by setting synchronous_standby_names to the ips of the master and slave )
In order to set up a slave (and to recycle an old master), we:
- stop postgresql
- copy a full base backup to a staging area, with the command:
sudo -u postgres pg_basebackup --no-password --host=$ip --label=Basebackup-$hostIP --username=root --pgdata=$stagingDir --xlog-method=stream
- wipe the postgresql data dir, with the command:
"find $pgData/* | grep -v server.crt | grep -v server.key | grep -v PG_VERSION | xargs rm -rf".
- move data from the staging dir to postgresql data dir
- remove a file called backup_label, but I am not certain that this file is in fact there (any more).
- remove the trigger file.
- generate the recovery.conf, with this content:
standby_mode = 'on'
primary_conninfo = 'host=" . $masterIP . " port=5432 user=$user password=$password application_name=" . $hostIP . " sslmode=require'
recovery_target_timeline = 'latest'
trigger_file = '/var/run/postgresql_trigger'
- start postgresql
- wait for it to become "streaming" in pg_stat_replication
... it should be noted that while this is going on, there are application servers that are working on the master database (some times relatively hard).
The log messages look something like the following (on the new slave):
Sep 20 14:57:13 ... LOG: database system was interrupted; last known up at 2016-09-20 14:57:11 IST
...
Sep 20 15:00:27 ... LOG: entering standby mode
Sep 20 15:00:27 ... LOG: database system was not properly shut down; automatic recovery in progress
Sep 20 15:00:27 ... LOG: redo starts at 12/143FA3A8
Sep 20 15:00:27 ... LOG: 12 KnownAssignedXids (num=12 tail=0 head=12) [0]=38572 [1]=38573 [2]=38574 [3]=38575 [4]=38576 [5]=38577 [6]=38578 [7]=38579 [8]=38580 [9]=38581 [10]=38582 [11]=38583
Sep 20 15:00:27 ... CONTEXT: xlog redo Standby/RUNNING_XACTS: nextXid 38585 latestCompletedXid 38571 oldestRunningXid 38572; 14 xacts: 38573 38575 38579 38578 38574 38581 38580 38576 38577 38572 38582 38584 38583 38583
Sep 20 15:00:27 ... FATAL: out-of-order XID insertion in KnownAssignedXids
Sep 20 15:00:27 ... CONTEXT: xlog redo Standby/RUNNING_XACTS: nextXid 38585 latestCompletedXid 38571 oldestRunningXid 38572; 14 xacts: 38573 38575 38579 38578 38574 38581 38580 38576 38577 38572 38582 38584 38583 38583
Sep 20 15:00:27 ... LOG: startup process (PID 8053) exited with exit code 1
Sep 20 15:00:27 ... LOG: terminating any other active server processes
As you can see, it spends a few minutes on starting up, where I am assuming that it is streaming changes from the master that occurred between the basebackup and the startup. In this period it gets connection attempts and logs a FATAL message that it is starting up (so I am assuming that it does not process requests, which is fine).
I have just found at least one place where the new master warns in the following manner, basically after the new slave starts starting up, but before it completes it:
Sep 20 14:58:13 ... ERROR: prepared transaction with identifier "131077_AAAAAAAAAAAAAP//fwAAAd3KJH9X370UA1vQpzEwLjE3OS4yMjYuNjk=_AAAAAAAAAAAAAP//fwAAAd3KJH9X370UA1vQrgAAAAEAAAAA" is busy
Sep 20 14:58:13 ... ROLLBACK PREPARED '131077_AAAAAAAAAAAAAP//fwAAAd3KJH9X370UA1vQpzEwLjE3OS4yMjYuNjk=_AAAAAAAAAAAAAP//fwAAAd3KJH9X370UA1vQrgAAAAEAAAAA'
In order to recover from this situation, we basically wipe the system and start over from a backup. This is not *as* bad as it sounds, but pretty painful nonetheless. It should be noted that we have not spent a significant amount of time determinig the best way to recover, since we have been focused on how to avoid the situation altogether
I am not certain which xlog file it is having difficulties with?
Best regards,
Fredrik
On October 18, 2016 at 8:49:29 pm +02:00, Andres Freund <andres@xxxxxxxxxxx> wrote:
Hi,On 2016-10-18 14:57:52 +0200, fredrik@xxxxxxxxxxxxx wrote:we are running many postgresql master/slave setups. The slaves areinitialised from a pg_basebackup from the master and are syncstreaming from the master. When we determine the master has failed,the slave is promoted. Some time after that, the old master is againinitialised with a pg_basebackup and starts streaming from the newmaster.Could you describe in a bit more detail how exactly you're setting upthe standbys? E.g. the exact recovery.conf used, whether you remove anyfiles during starting a standby. Also how exactly you're promotingstandbys?Recently, we have gotten this error a fair amount of times: "out-of-order XID insertion in KnownAssignedXids" when postgresql attempts to start after being initialised with a pg_basebackup from the current master.Which version are you encountering this on precisely?Once the issue has occurred, a subsequent re-initialisation (with a completely new pg_basebackup) does not resolve the issue.How have you recovered from this so far?I have a setup in the failing state, so I can produce any kind of log mesages / details that would be helpful.Could you use pg_xlogdump to dump the WAL file on which replay failed?And then attach the output in a compressed manner?Greetings,Andres Freund