Hi all,
I have a problem with a corrupt backup, fortunately I was only testing
so I did not loose any data. Unfortunetely what I did is to follow the
backup guidelines in the documentation, which I thought should work
reliably. Here are the details:
I am running a postgreSQL 8.4 database on a Debian Squeeze system. For
Backups I am using the warm standby and "Incrementally Updated Backup"
method as described in chapter 24.4 of the documentation. So my Setup is
as follows:
Server 1 (Main): PostgreSQL 8.4 Database with archive_mode enabled
shipping WAL files to a NFS drive. Size of database is about 370 GB and
growing.
Server 2 (Replica): PostgreSQL 8.4 Database in recovery mode. Using
pg_standby in recovery.conf and getting WAL files from Server 1 NFS drive.
All this works fine and runs without errors.
The replica is backed up once a week using rsync, a full backup runs
about 10 hours, so I also keep at least 24h of WAL files to make sure I
have a consistent backup.
The backup process also runs fine without errors, only the time (10h) it
takes is quite long, so I decided to test the backup:
1) Restored the full backup to a test directory
(var/lib/postgresql/8.4/test)
2) Copied the configuration of the main server to
/etc/postgresql/8.4/test/
Altered port number, paths and turned off archive mode in postgresql.conf.
3) Added a recovery.conf in the test servers data dir. Recovering from
my backed up WAL files:
restore_command = 'cp /var/postgresql-wal-test/%f "%p"'
4) Started the test server (pg_ctlcluster 8.4 test start)
5) Waited until recovery was done (everything worked fine until then)
2012-09-25 08:26:41 UTC LOG: database system is ready to accept connections
2012-09-25 08:26:41 UTC LOG: autovacuum launcher started
6) Connected via psql to the database and tried a \d to see my tables
which did *not* work!
Here is the output:
2012-09-25 08:27:03 UTC ERROR: could not access status of transaction
500185903
2012-09-25 08:27:03 UTC DETAIL: Could not open file "pg_clog/01DD": No
such file or directory.
Also trying to SELECT data from the database tables failed with the same
error.
The backup is corrupt. So my question is, what went wrong:
Obviously as the rsync started it copied everything from the pg_clog
(which at this point was until pg_clog/01DC) and then went on for
another 10+ hours backing up all the rest of the database. At the time
the backup ended, the database content changed but the newer clog files
did not go into the backup.
When restoring the backup and starting the server, the recovery process
started at a point where pg_clog was at state 01DE or even further and
thus the data from 01DD was missing.
So what I do from now, is an extra daily backup of my clog directory to
make sure to have working backups. This is not documented in the
postgreSQL documentation, and since the result in not doing so can be
quite severe I think you should consider this in future PostgreSQL
documentation versions.
Regards,
Juergen
Additional information:
The EXACT PostgreSQL version you are running:
PostgreSQL 8.4.13 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real
(Debian 4.4.5-8) 4.4.5, 32-bit
How you installed PostgreSQL
From Linux distro package management: Debian/Aptitude
If so, what repository?
deb http://ftp.at.debian.org/debian/ squeeze main non-free contrib
deb-src http://ftp.at.debian.org/debian/ squeeze main non-free contrib
deb http://security.debian.org/ squeeze/updates main contrib non-free
deb-src http://security.debian.org/ squeeze/updates main contrib non-free
deb http://ftp.debian.org/debian squeeze-updates main contrib non-free
deb-src http://ftp.debian.org/debian squeeze-updates main contrib non-free
Changes made to the settings in the postgresql.conf file:
name | current_setting
----------------------+-------------------------------------------------
version | PostgreSQL 8.4.13 on i486-pc-linux-gnu,
compiled by GCC gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 32-bit
archive_command | cp -i %p /var/postgres-wal/%f </dev/null && cp
-i %p /var/postgres-wal/bak/%f </dev/null && gzip /var/postgres-wal/bak/%f
archive_mode | on
archive_timeout | 0
client_encoding | utf8
effective_cache_size | 1000MB
lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
listen_addresses | *
log_line_prefix | %t
maintenance_work_mem | 256MB
max_connections | 100
max_stack_depth | 2MB
password_encryption | on
port | 5432
server_encoding | UTF8
shared_buffers | 650MB
ssl | on
synchronous_commit | off
TimeZone | UTC
work_mem | 40MB
Operating system and version
Linux distro and version:
Debian 6.0.6 (squeeze)
Kernel details:
Linux wegc203094 2.6.32-5-686 #1 SMP Sun May 6 04:01:19 UTC 2012 i686
GNU/Linux
What program you're using to connect to PostgreSQL:
psql and phpPgAdmin
--
| Juergen Fuchsberger
| Wegener Center for Climate and Global Change
| Karl-Franzens-University Graz
| Leechgasse 25, A-8010 Graz
| phone: +43-316-380-8438
| fax: +43-316-380-9830
| eMail: juergen.fuchsberger@xxxxxxxxxxx
| web: www.wegcenter.at
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general