Re-seeding the standby with a full base backup does seem to make the error go away. The standby started, caught up and has been working for about 2 hours. The file in the error message was an index. We rebuilt it just in case. Is there any way to debug the issue at this point? -----Original Message----- From: Lonni J Friedman [mailto:netllama@xxxxxxxxx] Sent: Saturday, June 22, 2013 4:11 PM To: Dan Kogan Cc: pgsql-general@xxxxxxxxxxxxxx Subject: Re: Standby stopped working after PANIC: WAL contains references to invalid pages Looks like some kind of data corruption. Question is whether it came from the master, or was created by the standby. If you re-seed the standby with a full (base) backup, does the problem go away? On Sat, Jun 22, 2013 at 12:43 PM, Dan Kogan <dan@xxxxxxxxxx> wrote: > Hello, > > > > Today our standby instance stopped working with this error in the log: > > > > 2013-06-22 16:27:32 UTC [8367]: [247-1] [] WARNING: page 158130 of > relation > pg_tblspc/16447/PG_9.2_201204301/16448/39154429 is uninitialized > > 2013-06-22 16:27:32 UTC [8367]: [248-1] [] CONTEXT: xlog redo vacuum: > rel 16447/16448/39154429; blk 158134, lastBlockVacuumed 158129 > > 2013-06-22 16:27:32 UTC [8367]: [249-1] [] PANIC: WAL contains > references to invalid pages > > 2013-06-22 16:27:32 UTC [8367]: [250-1] [] CONTEXT: xlog redo vacuum: > rel 16447/16448/39154429; blk 158134, lastBlockVacuumed 158129 > > 2013-06-22 16:27:32 UTC [8366]: [3-1] [] LOG: startup process (PID > 8367) was terminated by signal 6: Aborted > > 2013-06-22 16:27:32 UTC [8366]: [4-1] [] LOG: terminating any other > active server processes > > > > After re-start the same exact error occurred. > > > > We thought that maybe we hit this bug - > http://postgresql.1045698.n5.nabble.com/Completely-broken-replica-after-PANIC-WAL-contains-references-to-invalid-pages-td5750072.html. > > However, there is nothing in our log about sub-transactions, so it > didn't seem the same to us. > > > > Any advice on how to further debug this so we can avoid this in the > future is appreciated. > > > > Environment: > > > > AWS, High I/O instance (hi1.4xlarge), 60GB RAM > > > > Software and settings: > > > > PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc > (Ubuntu/Linaro > 4.5.2-8ubuntu4) 4.5.2, 64-bit > > > > archive_command rsync -a %p > slave:/var/lib/postgresql/replication_load/%f > > archive_mode on > > autovacuum_freeze_max_age 1000000000 > > autovacuum_max_workers 6 > > checkpoint_completion_target 0.9 > > checkpoint_segments 128 > > checkpoint_timeout 30min > > default_text_search_config pg_catalog.english > > hot_standby on > > lc_messages en_US.UTF-8 > > lc_monetary en_US.UTF-8 > > lc_numeric en_US.UTF-8 > > lc_time en_US.UTF-8 > > listen_addresses * > > log_checkpoints on > > log_destination stderr > > log_line_prefix %t [%p]: [%l-1] [%h] > > log_min_duration_statement -1 > > log_min_error_statement error > > log_min_messages error > > log_timezone UTC > > maintenance_work_mem 1GB > > max_connections 1200 > > max_standby_streaming_delay 90s > > max_wal_senders 5 > > port 5432 > > random_page_cost 2 > > seq_page_cost 1 > > shared_buffers 4GB > > ssl off > > ssl_cert_file /etc/ssl/certs/ssl-cert-snakeoil.pem > > ssl_key_file /etc/ssl/private/ssl-cert-snakeoil.key > > synchronous_commit off > > TimeZone UTC > > wal_keep_segments 128 > > wal_level hot_standby > > work_mem 8MB > > > > root@ip-10-148-131-236:~# /usr/local/pgsql/bin/pg_controldata > /usr/local/pgsql/data > > pg_control version number: 922 > > Catalog version number: 201204301 > > Database system identifier: 5838668587531239413 > > Database cluster state: in archive recovery > > pg_control last modified: Sat 22 Jun 2013 06:13:07 PM UTC > > Latest checkpoint location: 2250/18CA0790 > > Prior checkpoint location: 2250/18CA0790 > > Latest checkpoint's REDO location: 224F/E127B078 > > Latest checkpoint's TimeLineID: 2 > > Latest checkpoint's full_page_writes: on > > Latest checkpoint's NextXID: 1/2018629527 > > Latest checkpoint's NextOID: 43086248 > > Latest checkpoint's NextMultiXactId: 7088726 > > Latest checkpoint's NextMultiOffset: 20617234 > > Latest checkpoint's oldestXID: 1690316999 > > Latest checkpoint's oldestXID's DB: 16448 > > Latest checkpoint's oldestActiveXID: 2018629527 > > Time of latest checkpoint: Sat 22 Jun 2013 03:24:05 PM UTC > > Minimum recovery ending location: 2251/5EA631F0 > > Backup start location: 0/0 > > Backup end location: 0/0 > > End-of-backup record required: no > > Current wal_level setting: hot_standby > > Current max_connections setting: 1200 > > Current max_prepared_xacts setting: 0 > > Current max_locks_per_xact setting: 64 > > Maximum data alignment: 8 > > Database block size: 8192 > > Blocks per segment of large relation: 131072 > > WAL block size: 8192 > > Bytes per WAL segment: 16777216 > > Maximum length of identifiers: 64 > > Maximum columns in an index: 32 > > Maximum size of a TOAST chunk: 1996 > > Date/time type storage: 64-bit integers > > Float4 argument passing: by value > > Float8 argument passing: by value > > root@ip-10-148-131-236:~# > > > > Thanks again. > > > > Dan -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman netllama@xxxxxxxxx LlamaLand https://netllama.linux-sxs.org -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general