Search Postgresql Archives

pg_upgrade (and recovery) pitfalls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello -general!

(didn't want to pester -hackers with this, as I don't know if I've been doing something terribly wrong ;-)


Consider the following scenario/setup:
- 4 DB servers in 2 DCs
  - 1 primary (in DC1)
  - 1 sync secondary (in other DC)
  - 2 async secondaries (distributed over DCs)
  - 1 server running BARMAN for backups/archiving

- Puppet takes care of the PG config, recovery.conf etc.

- ssh connections are only allowed and enabled to and from the BARMAN host/user, no ssh between the DB hosts.

- as we switch between servers pretty often, BARMAN provides a single archive for all 4 servers. Only the primary does WAL archiving.

- General requirements are:
  - *always* have one sync secondary online (no exceptions)
  - best possible availability ==> shortest possible downtime



The approach for major upgrades is to upgrade 2 servers and see if everything works as expected, then 
- either follow up with the remaining secondaries or
- promote one of them and carry on with the old version

Since we always have 2 spare secondaries to fall back to, pg_upgrade in "--link" mode is a viable approach, as it's nice and fast.


The "naive" idea was to shutdown all instances (starting with the primary to enable final syncs), run "pg_upgrade -k" on both the former primary and the former sync secondary, re-link the recovery.conf on the secondary, re-enable the "primary" IP and start both.

D'oh! The secondary is complaining about a different cluster identifier:
"FATAL:  database system identifier differs between the primary and standby"

(From looking at the code, I could not determine straight away when and how this identifier is generated, but I guess it is somewhere in initdb.c?)

So, as we can't rsync (no ssh...), which would probably finish in a few seconds, a pg_basebackup is due. Which can be a PITA when the DB size is scraping on a TB and you have a single 1GB/sec connection. Bye, bye, availability (remember the primary requirement?).

==> So by now, we're only pg_upgrade'ing the primary and follow up with a pg_basebackup to the secondary, planning for much longer downtimes. <==


After finishing the pg_basebackup, re-link the recovery.conf, start. 
The recovery finds history-files from higher timelines in the archive, starts to recover those (?) and then complains that the timeline doesn't match (don't take the numbers here too seriously, this is from a low-traffic test system, the fork off TL 1 was at least a year ago):

restored log file "00000002.history" from archive
restored log file "00000003.history" from archive
restored log file "00000004.history" from archive
FATAL:  requested timeline 3 is not a child of this server's history
DETAIL:  Latest checkpoint is at 9C/36044D28 on timeline 1, but in the history of the requested timeline, the server forked off from that timeline at 69/88000000.

This mess can probably be cleaned up manually (delete the 000000[234].history etc. on both the secondary and the BARMAN archive), however to be 100% safe (or when you're unexperienced), you take another basebackup :/

And - after moving the *.history files out of the way in the archive - the secondary finally starts and starts receiving WALs.

Sidenote: this second problem happened to us as well after a colleague promoted a secondary for some R/W tests w/out first disabling the archive_command in the morning and then re-built it using pg_basebackup in the afternoon.
I have to say that it took me quite a while to figure out what was going on there... sudden timeline jumps, presumably out of nowhere?!?


Now, questions/wishes:
- did I do something obvious(ly) wrong?

- why does a recovery, based on a recovery.conf that points to a reachable primary (which obviously communicates its own timeline), still look for higher timelines' history-files in the archive and tries to jump onto these timelines? This doesn't seem reasoable to me at all...

- is there a way to have pg_upgrade/initdb use a particular "database system identifier" or some kind of "IV" for the new instance, allowing for identical upgrades on primary and secondary? Apart from that number (and the system tables' OIDs), the data directories should be identical, as far as I understand pg_upgrade's functionality? 
  (- and where do I find that identifier?)

- is there a way to have pg_upgrade/initdb initiate the new cluster on the same (or a higher) timeline, to prevent f***ing up the archive? If not, I'd certainly appreciate such an option! (I'm well aware that I need a new basebackup after the upgrade anyway, but alas!)

- any other hints (apart from rsync or other ssh-based methods) how I could get around the pg_basebackup would be highly appreciated. Something that only clones the system tables and not the bulk of the (identical) data files?
- is my situation (mandantory sync secondary) so unusual that nobody ever thought about the above? ;-)

Cheers & best regards,

Gunnar "Nick" Bluth




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux