We're not doing this long-term, in order to have a backup server we can fail-over to, but rather as a one-off low impact move of our database. Consequently, instead of using pg_start_backup and pg_stop_backup, and keeping all WAL, we're stopping the database, rsync of everything, and starting the database in the new server, with it appearing to the new server (if it was capable of noticing such things) that it had simply been shutdown and restarted.
This is fine. If the database is shutdown, then the backup is completely safe. You can bring up the cluster as on backup time without any issues.
The initial and repeated rsyncs while the first server is running and in use, are solely in order to reduce the time that the rsync takes while the postgresql application is stopped.
Do you still think we need to do anything special with pg_start_backup, pg_stop_backup, and WAL archives?
Yes, after the initial sync, if the next repeated rsyncs are performed while the database cluster is up and running, then "pg_start_backup()-rsync-pg_stop_backup()" (as said earlier) must be performed. This will help Postgres know that the backup is going on. When you do pg_start_backup(), Postgres will make note and updates all the base file headers and makes a note of the TXN ids and Checkpoint time by creating a label. So, the WAL archives at time are needed for recovery (to recover any half written transactions).
Without doing pg_start_backup, and with rsync not performing a "snapshot" backup, my assumption is that until we do an rsync with the service shutdown, whatever we've got at the location we're copying to, is not self-consistent.
Above explanation should answer this.
If we start up postgresql on it, won't it think it is recovering from a sudden crash? I think it may either appear to recover ok, or complain about various things, and not start up ok, with neither option providing us with much insight, as all that could tell us is that either some disk blocks are consistent, or some are not, which is our starting assumption anyway.
Starting up postgresql would probably result in more disk block changes that will result in more work next time we rsync.
This is normal behavior of rsync. It all depends on how volatile is your system and volume of changes performed.
How badly can we screw things up, given we intend to perform a final rsync with no postgresql services running? What should we try and avoid doing, and why?
We might simply compare some hashes between the two systems, of some files that haven't had their last-modified dates changed since the last rsync.
All this will be taken care by Postgres with the help of WAL archive files generated at the time when you performed rsync with postgres services up and running.
Thanks
VB