Hi Stephen,
> When PostgreSQL in the mode of Start Backup, PostgreSQL only writes to the
> XLOG, then you can safely rsync / copy the base data (snapshot) then later
> you can have full copy of snapshot backup data.
You are confusing two things.
After calling pg_start_backup, you can safely copy the contents of the
data directory, that is correct.
However, PostgreSQL *will* continue to write to the data directory.
That, however, is ok, because those changes will *also* be written into
the WAL and, after calling pg_start_backup(), you collect all of the
WAL using archive_command or pg_receivexlog.
Thanks for elaborating this Information, this is new, so whatever it is the procedure is Correct and Workable.
With all of the WAL
which was created during the backup, PG will be able to recover from the
changes made during the backup to the data directory, but you *must*
have all of that WAL, or the backup will be inconsistent because of
That is rather out of question, because all what we discuss here is just doing full/snapshot backup.
The backup is Full Backup or Snapshot and it will work whenever needed.
We are not saying about Incremental Backup yet.
Along with collecting the XLOG File, you can have incremental backup and having complete continuous data backup.
in this case, Stephen is suggesting on using pg_receivexlog or archive_command
(everything here is actually explained well on the docs))
those changes that were made to the data directory after
pg_start_backup() was called.
In other words, if you aren't using pg_receivexlog or archive_command,
your backups are invalid.
I doubt that *invalid* here is a valid word
In term of snapshot backup and as long as the snapshot can be run, that is valid, isn't it?
> if you wanted to backup in later day, you can use rsync then it will copy
> faster because rsync only copy the difference, rather than copy all the
> data.
This is *also* incorrect. rsync, by itself, is *not* safe to use for
doing that kind of incremental backup, unless you enable checksums. The
reason for this is that rsync has only a 1-second level granularity and
it is possible (unlikely, though it has been demonstrated) to miss
changes made to a file within that 1-second window.
As long as that is not XLOG file, anyway.. as you are saying that wouldn't be a problem since actually we can run the XLOG for recovery. .
> my latter explanation is: use pg_basebackup, it will do it automatically
> for you.
Yes, if you are unsure about how to perform a safe backup properly,
using pg_basebackup or one of the existing backup tools is, by far, the
best approach. Attempting to roll your own backup system based on rsync
is not something I am comfortable recommending any more because it is
*not* simple to do correctly.
OK, that is fine, and actually we are using that.
the reason why i explain about start_backup and stop_backup is to give a gradual understand, and hoping that people will get the mechanism in the back understandable.
Thanks!
Thanks for your great explanation!
Stephen