On Wed Jul 17, 09:16am +0900, Michael Paquier wrote: > On Wed, Jul 17, 2013 at 8:24 AM, David B Harris <dbharris@xxxxxxxxxxxxx> wrote: > > On Wed Jul 17, 08:12am +0900, Michael Paquier wrote: > > 4. In parallel, WAL archiving has copied all the logs from the > > start of pg_start_backup() through to the end of pg_stop_backup() > Yes exactly. This avoids to have to maintain your own scripts and rely > on the server features... What do you actually do to copy the > necessary WAL files. Do you fetch them directly from the master's > pg_xlog folder? Due to bandwidth constraints separate scripts are going to need to be maintained anyways. I'd love to use pg_basebackup, but since it transfers the entire cluster each time, it's not suitable - something like rsync is going to be used instead. Actually the use of pg_basebackup would require a separate script anyway, so regardless. If the WAL files do need to be copied, I'll be copying them separately yeah. (I'll use archive_command that only runs when backup_label is in place [pg_startup() has been called], put them somewhere safe during the cluster rsync [probably outside of the cluster directory], transfer them after pg_stop_backup(), and then delete them on the source/master server.) > > A configuration like this is provided like this in the documentation > > (again section 24.3), though some assembly is required. I'm hoping to > > avoid it since it'll be introducing a second channel to the backup which > > I can almost guarantee will be forgotten in time. (BTW, we tend to > > consider the lifespan of our installations in terms of decades.) > > > > I mean, if it's necessary, the so be it of course, but the question > > still remains: I've tested a fair bit under fairly adverse conditions > > and not had a single failure. Luck? > > It looks so, and wal_keep_segments is set to a value high enough on > the master side so as all the necessary WAL files are kept intact > somewhere even if they are not archived. I'm going to re-run my tests again, but here's what I did: 1. Set wal_keep_segments to a small value (I think I used 8) 2. Create a new database with a small canary table 2. pg_start_backup() 3. Run a bunch of database drops/loads and table drops/loads (with pg_restore and COPY .. FROM, respectively) ... for hours and hours (writing tens of GBs of data, without question cycling through wal_keep_segments) 4. Take a tarball of the cluster (including pg_xlog/) 5. pg_stop_backup() 6. Restore from tarball, start cluster 7. Check for canary table (successfully) Separately, I also prototyped the backup and I've run it dozens of times occasionally with a heavy write load on the source/master server without errors. Now the write load was likely never enough to saturate all allocatable WAL segments, but given the documentation (which says that the _last_ WAL segment which is synced by pg_stop_backup() must be copied as well), I'd have expected a consistency error or similar. Since somebody has now said "no you're crazy" (paraphrased :), I'm going to double-check again. I didn't mention in the original email, but of course one possibility is that the documentation might be incomplete simply in that if wal_keep_segments is set high enough such that the earliest segment isn't overwritten while the rsync is running, everything would be fine. But (again if I'm reading the documentation correctly), I think there should be consistency errors (again unless I'm just lucky). Maybe if there's an incomplete WAL transaction/segment/whatever, it just silently ignores it. (This would of course make sense.) I'll run the tests again and this time not use a canary table. It occurs to me that what I might be seeing is data files in the cluster not being deleted. If I'm dropping databases and/or tables, my base backup may be getting the old ones but the new ones might be inconsistent/broken. (Though I'd still expect a "cannot replay log" error of some sort at cluster startup.) -- Arguing with an engineer is like wrestling with a pig in mud. After a while, you realise the pig is enjoying it. OpenPGP v4 key ID: 4096R/59DDCB9F Fingerprint: CC53 F124 35C0 7BC2 58FE 7A3C 157D DFD9 59DD CB9F Retrieve from subkeys.pgp.net -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general