Greetings, * Jerry Sievers (gsievers19@xxxxxxxxxxx) wrote: > Suppose we have a DB cluster with an additional tablespace and we are > able to make an atomic SAN snapshot of *only* the main cluster > volume... > > The additional tablespace contains only UNLOGGED relations. > > We cannot snap that volume so we use rsync as follows... > > 1. pg_start_backup('foo'); > make SAN snapshot > rsync the add'l tablespace > pg_stop_backup() Surely you're also doing WAL archiving? You shouldn't ever be using start/stop backup without also doing WAL archiving as you must capture all of the WAL between the two. > Now provision a new cluster around the snapshot and rsync'd volume, > rejigger the pg_tblspc link if necessary... and start it up maybe or > not having it remain as a streaming replica. Dedicated backup tools know how to do tablespace remapping. > It's been my experience that possibly bulky data in the additional > tablespace does *not* need be rsync'd if we capture only the *_init > files. That isn't a trivial thing to do with rsync tho and "try it and see if it works" is a really bad way to go about this- it's important to understand what the files are in the data directory, how PG uses them, how PG recovery works, etc. That's why there are folks who write dedicated backup tools for PG (myself included). > Id' be curious to here feedback re the sanity of this approach. I wouldn't recommend it is the short answer. Note that you must be doing WAL archiving and you really should (and will have to, as of 15) use non-exclusive backup when you're doing this and be sure to copy the backup_label file from the pg_stop_backup/pg_backup_stop (in v15) results and put that into the snapshot. If you use exclusive mode then if the system crashes for any reason during the rsync then you'll have a good chance of ending up with a system that won't come back up until you go muck around in the data directory. It's not good and is why it's now been removed. Also, you'd have to craft a pretty ugly rsync to make it not copy data from the unlogged tables, and when you end up with a logged table in that tablespace (and you will, unless you're actively monitoring for it and remove any that pop up...) you could end up with data corruption. Don't think that you'll be able to use delta rsyncs with this either, as you may end up with files being changed without rsync realizing it and again end up with corruption. > And would also like to know if perhaps *only* the directories under > the rsync'd tablespace actually must be present for a successful > recovery. > > The above approach has worked mumerous times even with origin systems > having large, churny contents in the dedicated unlogged tablespace > (which is on a much faster local NVME volume than the main SAN > volume.) Dedicated backup tools already know how to recognize unlogged tables and skip them, along with being able to do incremental backup and delta restores, I'd strongly suggest you consider using such tools instead of trying to hack your own. Thanks, Stephen
Attachment:
signature.asc
Description: PGP signature