Re: Postgresql-9.0.1 Recovery

Venkat Balaji <venkat.balaji@xxxxxxxx> · Wed, 31 Aug 2011 12:10:36 +0530

Thanks Craig !
Below  is what i did -

1. pg_start_backup()
2. rsync the data dir
3. pg_stop_backup()

I believe the backup is valid because, i was able to bring up the cluster without any issues (ofcourse with data loss).

+ve signs-

I am able to bring up the cluster with the Online backup, but, only with the loss  of data.

-ve signs and things to be strongly foreseen while backup testing -
pg_clog files were not synced. I suspect they were being written at the time of backup. I might have tried to sync the data dir when pg_clog files were half filled.
Though the WAL Archives are there, Postgres is not trying to recover beyond the timestamp at which pg_clog was missing. 
Even if i replace the missing pg_clog files (which i did), Postgres is asking for the corresponding wal archive files
Yes. What i learnt is that we need to ensure that all the pg_clog files must be fully copied as on the backup time. We cannot afford to miss any of them.

Thanks
Venkat
On Wed, Aug 31, 2011 at 5:46 AM, Craig Ringer <ringerc@xxxxxxxxxxxxx> wrote:

On 30/08/2011 6:59 PM, Venkat Balaji wrote:

Hello Everyone,

I have a situation here -

I am trying to restore the production online backup and recover the same.

  - I had initially rsynced (excluded pg_log) the data directory and the

tarred and zipped the same

Did you do that after pg_start_backup() or on a stopped database server?

If you did it on a running database server without first running pg_start_backup(), your backup is invalid.

Personally I like to take my base backups from an LVM snapshot of the datadir just to be extra safe. That isn't necessary, though, and a regular rsync or tar or whatever of a datadir after pg_start_backup() is fine.

Remember to run pg_stop_backup() afterwards.

  - I got an error "unable to read <filename> from pg_clog location"

(file size is around 160K)

... from PostgreSQL, when you tried to start it?

What emitted that error message?

What i understood is that, rsync some how missed out on syncing the

files in "pg_clog"  so, i had manually coped the missing pg_clog file

from production and tried recovery.

That won't work. You need a consistent snapshot of all the files in the data dir. You cannot just mix and match copies taken at different times.

For efficiency reasons PostgreSQL will recycle used clog files. You can't just copy a file over and hope that because it has the same name, it still contains the data you want.

Your backup *failed* at the point where you got an incomplete copy of the data directory.

Do i need to get that particular wal archive which is before online

backup time ?

No, you need to get the missing clog files. If you cannot do that, try using pg_resetxlog, but be aware that that may lose transactions and can potentially cause corruption of tables and indexes.

By this experience what i understand is that Postgresql stores committed

and uncommited transactions in pg_xlog / wal archive files and

information (not the transaction data) about transaction commit status

is stored in pg_clog. Am I correct ?

That sounds right to me, but I don't know as much about how Pg stores things as I should.

I am in the process of designing a disaster recovery planner for our

productions systems.

Congratulations!

Be extremely glad this didn't happen in a real recovery scenario. This is a marvellous example of why you should always test your backups - you actually did, and found a problem that would've been a critical issue if the backup were actually needed.

--

Craig Ringer