prakhar jauhari wrote: > I am facing a similar kind of problem, but in a two node setup. Streaming replication is being used > with a high wal_keep_segments, using log shipping to bump up timelines on the standby server to setup > SR(streaming replication). > DB1 - master > > DB2 - standby > > > When a switchover happens DB2 becomes the new master and when DB1 comes up it will act as the standby > to the new master (.history files from new master are copied to DB1 to bump up its timeline so as to > setup SR). DB1 is not recreated from scratch. This runs fine in normal switchover, but there seems to > be problem in the following situation, leading to database corruption: > > > Current state : > DB1 - master > DB2 - standby > > > Now the failing scenario: > > > > 1. DB2 machine goes down. > > 2. After some time DB1 machine also goes down (DB2 is still down). > > 3. Now DB2 comes up (it will join the cluster as master as DB1 is still down). > > 4. DB2 is started as master postgresql. > > 5. Now DB1 comes up (it will join the cluster as standby to DB2) > > 6. Now when DB1 attempts to SR with DB2, DB1 gets corrupted. > > > Looked into the issue and found that when DB1 went down initially, it created some WAL's which were > not synced to DB2 as it was already down. > > Now when DB2 started as master it still had not played the last few WAL's created by DB1(when it was > master). DB2 starts as master properly. > > When DB1 came as standby to DB2, it bumped it timeline using history file from DB2, but when SR was > setup with DB2, DB1 gets corrupted. > > > Now the question is: > > > 1. Is this a theoretically valid approach? > > 2. If it is a valid approach, then how can i detect such a scenario (where SR will corrupt the DB)? So > that i can go for a basebackup in such situation. If you want to use the old primary as new standby without a new backup, you have to ascertain that all transactions from the former have been replayed at the latter. To figure out where the primary currently is, you can SELECT pg_current_xlog_location(); To figure how much the standby has replayed, you can SELECT pg_last_xlog_replay_location(); Of course this only works if both are up. I think that it would be tricky to automatize that; I'd choose making a new backup after each failover. In the event of a controlled failover it might be an option. I am surprised that the scenario you described leads to corruption; I would have expected an error message. Yours, Laurenz Albe -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin