Re: pg v. 8.4.5 misses objects and data after restoring from backup using wal

Imre Oolberg <imre@xxxxxxxxxxx> · Mon, 03 Jan 2011 12:59:33 +0200

Hi and Happy New Year!

Unfortunately i still have the same problem, in the mean time i tried 
out with different versions and still get same results. Experimenting on 
Debian Lenny v 5.0 with v. compiled v. 9.0.2 where default conf is 
changed only in this

wal_level = archive
archive_mode = on
archive_command = 'test ! -f /data/backup/postgresql/archive-logs/%f && 
cp %p /data/backup/postgresql/archive-logs'

and having as a starting point of the recovery

1. data cluster file sytem copied with rsync between pg_start_backup and 
pg_stop backup
2. wal logs archived starting earier that pg_start_backup and continuing 
after pg_stop_backup

I guess that my problem is probably that although my .backup file says

000000010000000000000009.00000020.backup
START WAL LOCATION: 0/9000020 (file 000000010000000000000009)
STOP WAL LOCATION: 1/6325B2E0 (file 000000010000000100000063)
CHECKPOINT LOCATION: 0/A2C4908
START TIME: 2011-01-03 11:32:17 EET
LABEL: test
STOP TIME: 2011-01-03 12:18:27 EET

looking at the pg_ctl.log it starts reading wal logs beginning with 
000000010000000100000063 and not with 000000010000000000000009

LOG:  database system was interrupted; last known up at 2011-01-03 
11:58:56 EET
LOG:  creating missing WAL directory "pg_xlog/archive_status"
LOG:  starting archive recovery
LOG:  restored log file "000000010000000100000063" from archive
LOG:  consistent recovery state reached at 1/6325B2B8
LOG:  redo starts at 1/6325B2B8
LOG:  restored log file "000000010000000100000064" from archive
LOG:  restored log file "000000010000000100000065" from archive
LOG:  restored log file "000000010000000100000066" from archive
...
LOG:  restored log file "000000010000000100000081" from archive
cp: cannot stat 
`/data/backup/postgresql/archive-logs/000000010000000100000082': No such 
file or directory
LOG:  could not open file "pg_xlog/000000010000000100000082" (log file 
1, segment 130): No such file or directory
LOG:  redo done at 1/81530E18
LOG:  last completed transaction was at log time 2011-01-03 
12:20:31.917695+02
LOG:  restored log file "000000010000000100000081" from archive
cp: cannot stat `/data/backup/postgresql/archive-logs/00000002.history': 
No such file or directory
LOG:  selected new timeline ID: 2
cp: cannot stat `/data/backup/postgresql/archive-logs/00000001.history': 
No such file or directory
LOG:  archive recovery complete
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections

I tried to follow a pitr procedure as per 
http://www.postgresql.org/docs/9.0/interactive/continuous-archiving.html 
and i would be very greatful if somebody comments this and points me to 
the right direction.

Best regards,

Imre

On 12/30/10 13:03, Imre Oolberg wrote:
Hi!

I did my usual test restoration from backup which leaded me to suspect 
that some data is missing although database starts up. I did some more 
testing to reproduce the problem as generally as i could and i wish to 
present it here and hope that somebody could comment on this. It may 
well be my bad as i am doing some systematic error or something else.

I use debian lenny amd64 with postgresql from backports which is

# dpkg -l postgre\* | grep ii
ii  postgresql-8.4                      8.4.5-2~bpo50+1 
object-relational SQL database, version 8.4
ii  postgresql-client-8.4               8.4.5-2~bpo50+1 front-end 
programs for PostgreSQL 8.4
ii  postgresql-client-common            111~bpo50+1 manager for 
multiple PostgreSQL client versi
ii  postgresql-common                   111~bpo50+1 PostgreSQL 
database-cluster manager
ii  postgresql-contrib-8.4              8.4.5-2~bpo50+1 additional 
facilities for PostgreSQL
ii  postgresql-plperl-8.4               8.4.5-2~bpo50+1 PL/Perl 
procedural language for PostgreSQL 8
ii  postgresql-server-dev-8.4           8.4.5-2~bpo50+1 development 
files for PostgreSQL 8.4 server-

In addition i have compiled and installed pljava, plproxy and skytools 
libraries but in this test i do not use them, i.e. i guess it doest 
really matter that the are in /usr/lib/postgresql/8.4/lib directory 
since no languages/functions are defined to use them.

PostgreSQL config is default except wal logging is turned on.

My procedure consists of the following steps

1. start up database
2. created schema test1 and into it table test like

CREATE TABLE test1.test (
  test_id serial NOT NULL,
  pilt bytea,
  CONSTRAINT test_id_pkey PRIMARY KEY (test_id)
)

and started inserting data with this script

import psycopg2, cPickle
conn = psycopg2.connect("dbname=test host=127.0.0.1 user=postgres 
password=xxx port=5432")
cur = conn.cursor()

f = open('/boot/initrd.img-2.6.26-2-amd64', 'rb')
binary = f.read()
sql = "insert into test1.test (pilt) values (%s)";
cur.execute(sql, (psycopg2.Binary(binary),))
cur.close()
conn.commit()
conn.close()

like

$ for i in `seq 1 200`; do echo $i; python pi1.py; sleep 10; done

3. issued pg_start_backup('test') while script is still running
4. created schemas and tables test2.test and test3.test and started 
insterting data similarly
5. while three scripts are running issued rsync -avH --bwlimit=1024 
root@localhost:/var/lib/postgresql/ /mnt/backup/
6. while rsync is running i created schemas and tables test4.test and 
test5.test and started insterting data similarly
7. when scripts and rsync have been completed i said pg_stop_backup
8. created schema and table test6.test and started insterting data 
similarly

When everything was finished i had approx 5g used as data cluster 
filesystem and about 400 wal logs by 16 MB each.

Restore went like this

1. mounted /mnt/backup/ under /var/lib/postgresql
2. removed backup_label
3. created recovery.conf with a line

restore_command = 'cp /data/backup/postgresql/archive-logs/%f %p'

4. emptied pg_xlog directory while having directory itself
5. started postgresql and it finishes accepting connections

And now observations

1. while i have 400 wal log files during recovery it used only about 
last 20 of them
2. while original working postgresql data cluster filesystem's used 
size was about 5 gb after restore used space is 2.6 gb
3. looking thru tables, most of them have quarter of half less entries 
than was actually insterted and is present in original database
4. some tables are missing (test4.test) and some give errors like 
(test5.test) when i say select count(*) from test5.test;

2010-12-30 12:41:23 EET ERROR:  catalog is missing 2 attribute(s) for 
relid 16586 at character 22
2010-12-30 12:41:23 EET STATEMENT:  select count(*) from test5.test;

tried it also with compiled version of 8.4.5 but i chose to use lenny 
+ backports because it may be this way more reproducable that my 
compiled instance. Actually my data size is about 160 gb, and i tested 
also with sizes about 1g and 4 tables but successful reproduction 
seems to need database size in filesystem around 5 g and 6 tables.

I undestand that this report is not very precise, i.e. i have not 
fixed how much data was interted into database at the moment rsyncing 
started but i hope it has some useful points and someone could help me 
further, i am also ready to do some more testing if it is helpful.

Imre

--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin