> Rajesh Kumar Mallah wrote: > |>| Do you see any problem in the current approach ? > |>| i have seen it working fine till now. > |> > |>I do, to be honest. The WAL location counter accounts for 4294967295 > |>positions and while I'm certain that's WAY more than the average number > |>of transactions that go into a WAL, quite a number of small ones can > |>certainly happen before a WAL is rolled over, and until then, you're > |>dealing with the same log file. > |> > |>If two backups happen in that period of time for whatever reason, you're > |>going to have a false positive by looking into ${WAL_ARCHIVE} and > |>searching just for the WAL name, so including the location in the search > |>of a WAL fragment is certainly necessary. Infact, going purely by > |>chance, the probability of hitting the same location in two different > |>log files in two subsequent backups is much lower than hitting the same > |>WAL twice. > | > | The current wal log is not being removed from the wal archive area > | in any case. The files less than the current ones are being rm'ed. > | > | I am sorry i am not able to get your apprehension. But i shall > | surely try harder to understand your point. > > Hi Rajesh, list. > > I'm sorry I didn't get back to you earlier, I was at an IBM business > conference for a couple of days; not to say it rendered me incapable of > communicating via e-mail, but it did bring along certain social > responsibilities which caused me to both stay up and sleep late, if you > know what I mean. :) > > Let me explain the above predicament in more practical terms. > > Let us say you started a backup very soon after a WAL had been rolled > over. Current WAL at that time was called, for example, > ${PGDATA}/pg_xlog/000000010000000E0000000A. The location at that time > was 000F594A (iow, early in the WAL cycle). [disclaimer: all events in > this story are entirely fictional, any similarity to actual persons and > events is purely coincidental :) ] > > pg_start_backup() will create a WAL backup: > ~ ${PGDATA}/pg_xlog/000000010000000E0000000A.000F594A.backup > > which will be archived to ${WAL_ARCHIVE} under the same name, or > possibly given a different extension, depending on archive_method. Let > us assume for the purpose of this explanation, that archive_method > consists only of cp -i </dev/null, although the problem would have been > identical if one used gzip -c, for example. > > Now, this backup fails for whatever reason (rsync trouble, etc.). You > abort it and leave WAL archive as it was. You diagnose the problem that > caused the backup to fail and repeat the procedure. And since your > diagnostic skills are so good it took you almost no time to fix it, the > database engine is now at location 002D94AF in that _same_ WAL. > > Once you restart the backup script, pg_start_backup() is called and > ${PGDATA}/pg_xlog/000000010000000E0000000A.002D94AF.backup is created > and archived to ${WAL_ARCHIVE} under that same name. > > Your method of discovering logs to delete will now match _two_ "current" > log file archives instead of one, because they both come from the same > WAL, fail to actually delete the stale one (the one from position > 000F594A) and thus clutter your backup with irrelevant WAL fragments. Dear Grega, Thanks for the reply. now i have started understanding ! Is cluttering of the wal archive area in cases where that backup had to be re-started for whatever reasons is the *only* concern ? if its so , we should not be too much bothered becoz in the next successfull backup the extra clutter will get deleted. if there are other concern please lemme know. > > The second part of the second paragraph was only to expose that, > following the same logic as outlined above, if you take WAL locations as > the criterion of removing stale WAL fragments instead of WAL names, it > is far less likely to hit a false positive, because you would have to > pg_start_backup() _exactly_ 4294967296 locations after the first one. > > Of course, you want to be unambiguous in your search of the perfect WAL > archive, so you want to use _both_ WAL name and location as the criterion. > > | the old log files without the base backup are not useful. since > | rsync is being used to optimise the copying by overwriting the > | base backup everytime, i dont thing preserving the old files > | makes sense. Had it been and non overwritng backup the files > | would have made sense. > > I see. I was assuming you used rsync to copy the database cluster > somewhere then tar it there, while it was lying still ("Fell, Destroyed" > of Fugazi comes to mind :) ). > > I will get back to you with the review of your script later. A quick > scan reveals there is not much left to be improved, though. Please do not put too much effort, as i the drives in my other server has got installed and i am adapting the script for doing remote backup ( which is a more common senerio). Thank You Regds Rajesh Kumar Mallah. >