Greetings, * Rob Emery (re-pgsql@xxxxxxxxxxxxxxx) wrote: > We're pretty sure that we've just hit a scenario where the instructions for > pg_upgrade with standby servers aren't quite safe. I've pretty much always felt those instructions should come with a "expert use only" note. > We were testing our upgrade process with a copy of our live cluster in a > lab environment and we found that using > > rsync --verbose --archive --delete --hard-links --size-only > --no-inc-recursive /media/postgresql/data/main /media/postgresql/data/9.6 > PGRETESTA02:/media/postgresql/data > > Resulted in: > > 2020-12-02 14:49:11.513 GMT [20884-1] LOG: database system was shut > down in recovery at 2020-12-02 13:57:56 GMT > 2020-12-02 14:49:11.513 GMT [20884-2] LOG: entering standby mode > 2020-12-02 14:49:11.557 GMT [20884-3] LOG: consistent recovery state > reached at 19E/25000098 > 2020-12-02 14:49:11.557 GMT [20884-4] LOG: invalid record length at > 19E/25000098: wanted 24, got 0 > 2020-12-02 14:49:11.559 GMT [20883-1] LOG: database system is ready > to accept read only connections > 2020-12-02 14:49:11.593 GMT [20888-1] FATAL: database system > identifier differs between the primary and standby > 2020-12-02 14:49:11.593 GMT [20888-2] DETAIL: The primary's > identifier is 6901669428825624285, the standby's identifier is > 690161835164 > 1138930. > > however if we don't use --size-only, then the process worked fine and we a > smooth upgrade. This doesn't look like a --size-only issue though- how did you end up with this when the pg_controldata file (where the system identifier is pulled from) shouldn't even exist and won't be a hardlink to the existing one on the old system since it's a new cluster, and therefore should definitely be copied? When you go to run the rsync to get the replica up to date you should have a system that looks like this: PRIMARY: /srv/old_cluster ... lots of files /srv/new_cluster ... lots of *new* files, including pg_controldata ... hardlinks to PG table/index files that aren't part of the catalog REPLICA: /srv/old_cluster ... lots of files /srv/new_cluster ... entirely empty Then you run the rsync, at the /srv level, and all those 'new files' in /srv/new_cluster on the primary should get copied over to /srv/new_cluster on the replica, while any files which are hard-linked between old_cluster and new_cluster should end up as hard links on the replica. Thanks, Stephen
Attachment:
signature.asc
Description: PGP signature