Below is what i did -
1. pg_start_backup()
2. rsync the data dir
3. pg_stop_backup()
I believe the backup is valid because, i was able to bring up the cluster without any issues (ofcourse with data loss).
+ve signs-
I am able to bring up the cluster with the Online backup, but, only with the loss of data.
-ve signs and things to be strongly foreseen while backup testing -
- pg_clog files were not synced. I suspect they were being written at the time of backup. I might have tried to sync the data dir when pg_clog files were half filled.
- Though the WAL Archives are there, Postgres is not trying to recover beyond the timestamp at which pg_clog was missing.
- Even if i replace the missing pg_clog files (which i did), Postgres is asking for the corresponding wal archive files
Yes. What i learnt is that we need to ensure that all the pg_clog files must be fully copied as on the backup time. We cannot afford to miss any of them.
Thanks
Venkat
On Wed, Aug 31, 2011 at 5:46 AM, Craig Ringer <ringerc@xxxxxxxxxxxxx> wrote:
On 30/08/2011 6:59 PM, Venkat Balaji wrote:Did you do that after pg_start_backup() or on a stopped database server?
Hello Everyone,
I have a situation here -
I am trying to restore the production online backup and recover the same.
- I had initially rsynced (excluded pg_log) the data directory and the
tarred and zipped the same
If you did it on a running database server without first running pg_start_backup(), your backup is invalid.
Personally I like to take my base backups from an LVM snapshot of the datadir just to be extra safe. That isn't necessary, though, and a regular rsync or tar or whatever of a datadir after pg_start_backup() is fine.
Remember to run pg_stop_backup() afterwards.... from PostgreSQL, when you tried to start it?
- I got an error "unable to read <filename> from pg_clog location"
(file size is around 160K)
What emitted that error message?That won't work. You need a consistent snapshot of all the files in the data dir. You cannot just mix and match copies taken at different times.
What i understood is that, rsync some how missed out on syncing the
files in "pg_clog" so, i had manually coped the missing pg_clog file
from production and tried recovery.
For efficiency reasons PostgreSQL will recycle used clog files. You can't just copy a file over and hope that because it has the same name, it still contains the data you want.
Your backup *failed* at the point where you got an incomplete copy of the data directory.No, you need to get the missing clog files. If you cannot do that, try using pg_resetxlog, but be aware that that may lose transactions and can potentially cause corruption of tables and indexes.
Do i need to get that particular wal archive which is before online
backup time ?
That sounds right to me, but I don't know as much about how Pg stores things as I should.
By this experience what i understand is that Postgresql stores committed
and uncommited transactions in pg_xlog / wal archive files and
information (not the transaction data) about transaction commit status
is stored in pg_clog. Am I correct ?
Congratulations!
I am in the process of designing a disaster recovery planner for our
productions systems.
Be extremely glad this didn't happen in a real recovery scenario. This is a marvellous example of why you should always test your backups - you actually did, and found a problem that would've been a critical issue if the backup were actually needed.
--
Craig Ringer