Re: 'full_page_writes=off' , VACUUM and crashing streaming slaves...

Sean Chittenden <sean@xxxxxxxxxxxxxx> · Sun, 7 Oct 2012 15:19:41 -0700

> So that explains the crashes, but it doesn't (directly) explain why you
> had data corruption.

The first crash message:

Oct  5 15:00:21 db01 postgres[75852]: [6406-1] failjava@dbcluster 75852 0: ERROR:  could not seek to end of file "base/16387/18770": Too many open files in system
Oct  5 15:00:22 db01 postgres[75852]: [6413-1] failjava@dbcluster 75852 0: ERROR:  could not seek to end of file "base/16387/20259": Too many open files in system
Oct  5 15:00:23 db01 postgres[75852]: [6424-1] failjava@dbcluster 75852 0: ERROR:  could not seek to end of file "base/16387/20218": Too many open files in system
Oct  5 15:00:23 db01 postgres[75852]: [6431-1] failjava@dbcluster 75852 0: ERROR:  could not seek to end of file "base/16387/20218": Too many open files in system
Oct  5 15:00:24 db01 postgres[75852]: [6442-1] failjava@dbcluster 75852 0: ERROR:  could not seek to end of file "base/16387/20218": Too many open files in system
Oct  5 15:00:24 db01 postgres[75852]: [6449-1] failjava@dbcluster 75852 0: ERROR:  could not open file "base/16387/20302": Too many open files in system
Oct  5 15:00:24 db01 postgres[75852]: [6456-1] failjava@dbcluster 75852 0: ERROR:  could not seek to end of file "base/16387/20013": Too many open files in system
Oct  5 15:00:24 db01 postgres[75852]: [6463-1] failjava@dbcluster 75852 0: ERROR:  could not seek to end of file "base/16387/20259": Too many open files in system
Oct  5 15:00:24 db01 postgres[75852]: [6470-1] failjava@dbcluster 75852 0: ERROR:  could not seek to end of file "base/16387/20259": Too many open files in system
Oct  5 15:00:24 db01 postgres[75852]: [6480-1] failjava@dbcluster 75852 0: ERROR:  could not open file "base/16387/20275": Too many open files in system
Oct  5 15:00:25 db01 postgres[75852]: [6481-1] failjava@dbcluster 75852 0: ERROR:  current transaction is aborted, commands ignored until end of transaction block
Oct  5 15:00:25 db01 postgres[76648]: [5944-1] failjava@dbcluster 76648 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:00:25 db01 postgres[76649]: [5944-1] failjava@dbcluster 76649 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:00:25 db01 postgres[76407]: [5894-1] failjava@dbcluster 76407 0: WARNING:  terminating connection because of crash of another server process

For the sake of completeness, I looked to correlate the above relations with the tables that had problems, and there is zero correlation between seek(2) errors and tables that had VACUUM reported invalid pages.

     nspname     |     relname     | relnamespace | relfilenode | relkind |  oid  
-----------------+-----------------+--------------+-------------+---------+-------
 failjava_master | tbl_F           |        16388 |       16402 | r       | 16402
 failjava_db1    | tbl_D           |        16417 |       18473 | r       | 18473
 failjava_db1    | tbl_E           |        16417 |       20179 | r       | 20179
 failjava_db1    | tbl_C           |        16417 |       20187 | r       | 20187
 failjava_db1    | tbl_B           |        16417 |       20200 | r       | 20200
 failjava_db1    | tbl_A           |        16417 |       41731 | r       | 19533

But I'm wondering if it died a second time before the system had finished its recovery:

Oct  5 15:01:31 db01 postgres[76888]: [6503-1] @ 76888 0: WARNING:  could not open statistics file "pg_stat_tmp/pgstat.stat": Too many open files in system
Oct  5 15:01:31 db01 postgres[77571]: [5978-1] @ 77571 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:01:31 db01 postgres[77572]: [5979-1] failjava@dbcluster 77572 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:01:31 db01 postgres[77572]: [5980-1] failjava@dbcluster 77572 0: WARNING:  terminating connection because of crash of another server process
Oct  5 15:01:31 db01 postgres[77544]: [7490-1] failjava@dbcluster 77544 0: WARNING:  terminating connection because of crash of another server process

Unfortunately the logs that I have for the first set of crashes are limited to WARNING+ and don't include INFO log levels (though I have it later on. Due to various IRL constraints - hardware failure of the intended host - the provisioning of the master was a bit of a cannonball dash).

> I think the uninitialized pages are showing up because you had crashes
> in the midst of relation-extension operations, ie, some other backend
> had successfully done an smgrextend but hadn't yet laid down any valid
> data in the new page.  However, this theory would not explain more than
> one uninitialized page per crash, and your previous message seems to
> show rather a lot of uninitialized pages.  How many pipe-failed crashes
> did you have?

Assuming ~1min per recovery, it looks like around 38.

Oct  5 15:00:25 db01 postgres[76648]: [5944-1] failjava@dbcluster 76648 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:00:25 db01 postgres[76649]: [5944-1] failjava@dbcluster 76649 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:01:31 db01 postgres[77571]: [5978-1] @ 77571 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:01:31 db01 postgres[77572]: [5979-1] failjava@dbcluster 77572 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:02:20 db01 postgres[78473]: [5987-1] failjava@dbcluster 78473 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:03:47 db01 postgres[79408]: [6005-1] failjava@dbcluster 79408 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:05:06 db01 postgres[80440]: [6112-1] failjava@dbcluster 80440 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:06:09 db01 postgres[81276]: [6117-1] @ 81276 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:07:04 db01 postgres[82102]: [6131-1] failjava@dbcluster 82102 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:08:19 db01 postgres[82336]: [6136-1] @ 82336 0: FATAL:  could not open archive status directory "pg_xlog/archive_status": Too many open files in system
Oct  5 15:08:20 db01 postgres[83084]: [6161-1] @ 83084 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:08:22 db01 postgres[83091]: [6184-1] failjava@dbcluster 83091 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:09:36 db01 postgres[84096]: [6218-1] failjava@dbcluster 84096 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:10:50 db01 postgres[85064]: [6226-1] failjava@dbcluster 85064 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:12:05 db01 postgres[86106]: [6233-1] failjava@dbcluster 86106 0: FATAL:  could not open relation mapping file "global/pg_filenode.map": Too many open files in system
Oct  5 15:12:05 db01 postgres[86107]: [6233-1] failjava@dbcluster 86107 0: FATAL:  could not open relation mapping file "global/pg_filenode.map": Too many open files in system
Oct  5 15:12:05 db01 postgres[86108]: [6233-1] failjava@dbcluster 86108 0: FATAL:  could not open relation mapping file "global/pg_filenode.map": Too many open files in system
Oct  5 15:12:05 db01 postgres[86109]: [6233-1] failjava@dbcluster 86109 0: FATAL:  could not open relation mapping file "global/pg_filenode.map": Too many open files in system
Oct  5 15:12:05 db01 postgres[86105]: [6234-1] failjava@dbcluster 86105 0: FATAL:  could not open file "base/16387/PG_VERSION": Too many open files in system
Oct  5 15:12:08 db01 postgres[86119]: [6252-1] failjava@dbcluster 86119 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:13:36 db01 postgres[87126]: [6295-1] failjava@dbcluster 87126 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:14:23 db01 postgres[88122]: [6364-1] failjava@dbcluster 88122 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:15:31 db01 postgres[88370]: [6369-1] @ 88370 0: FATAL:  could not open archive status directory "pg_xlog/archive_status": Too many open files in system
Oct  5 15:15:31 db01 postgres[89057]: [6398-1] @ 89057 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:15:34 db01 postgres[89067]: [6426-1] failjava@dbcluster 89067 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:16:38 db01 postgres[90080]: [6430-1] @ 90080 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:17:19 db01 postgres[90941]: [6446-1] failjava@dbcluster 90941 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:18:43 db01 postgres[91898]: [6484-1] failjava@dbcluster 91898 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:20:02 db01 postgres[93068]: [6514-1] failjava@dbcluster 93068 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:22:03 db01 postgres[94275]: [6519-1] failjava@dbcluster 94275 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:22:03 db01 postgres[94276]: [6519-1] failjava@dbcluster 94276 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:23:39 db01 postgres[95392]: [6525-1] failjava@dbcluster 95392 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:24:52 db01 postgres[96371]: [6533-1] failjava@dbcluster 96371 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:26:22 db01 postgres[97337]: [6570-1] failjava@dbcluster 97337 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:27:48 db01 postgres[98419]: [6607-1] failjava@dbcluster 98419 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:29:02 db01 postgres[99379]: [6706-1] failjava@dbcluster 99379 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:30:20 db01 postgres[467]: [6719-1] failjava@dbcluster 467 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:32:08 db01 postgres[1479]: [6769-1] failjava@dbcluster 1479 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:33:28 db01 postgres[2714]: [6776-1] failjava@dbcluster 2714 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:34:18 db01 postgres[3600]: [6870-1] failjava@dbcluster 3600 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:36:07 db01 postgres[4608]: [6896-1] failjava@dbcluster 4608 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:37:18 db01 postgres[5652]: [6944-1] failjava@dbcluster 5652 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:38:56 db01 postgres[6640]: [7018-1] failjava@dbcluster 6640 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:39:46 db01 postgres[7680]: [7025-1] failjava@dbcluster 7680 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:41:08 db01 postgres[8609]: [7081-1] failjava@dbcluster 8609 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:42:20 db01 postgres[9546]: [7095-1] failjava@dbcluster 9546 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:43:53 db01 postgres[10468]: [7127-1] failjava@dbcluster 10468 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:45:32 db01 postgres[11634]: [7159-1] failjava@dbcluster 11634 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:47:10 db01 postgres[12721]: [7184-1] failjava@dbcluster 12721 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:48:47 db01 postgres[13831]: [7194-1] failjava@dbcluster 13831 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:49:47 db01 postgres[14912]: [7217-1] failjava@dbcluster 14912 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:51:32 db01 postgres[15928]: [7223-1] failjava@dbcluster 15928 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:52:34 db01 postgres[16954]: [7231-1] failjava@dbcluster 16954 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:53:49 db01 postgres[17963]: [7247-1] failjava@dbcluster 17963 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:55:36 db01 postgres[19309]: [7269-1] failjava@dbcluster 19309 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:57:08 db01 postgres[20273]: [7286-1] failjava@dbcluster 20273 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:58:34 db01 postgres[20616]: [7291-1] @ 20616 0: FATAL:  could not open archive status directory "pg_xlog/archive_status": Too many open files in system
Oct  5 15:58:34 db01 postgres[21404]: [7293-1] @ 21404 0: FATAL:  could not open archive status directory "pg_xlog/archive_status": Too many open files in system
Oct  5 15:58:47 db01 postgres[21442]: [7294-1] @ 21442 0: FATAL:  pipe() failed: Too many open files in system
Oct  5 15:58:49 db01 postgres[21450]: [7318-1] failjava@dbcluster 21450 0: FATAL:  pipe() failed: Too many open files in system
[ Once I setup monitoring, I became aware that the app (in a 100% idle state) was consuming ~80K fds because the app's initialization procedure touches nearly all 500 tables across it's ~2K connections ]

<rant>It would be exceptionally nice if JDBC grew a brain and be written in an intelligent manner such that pgbouncer could be used in transaction mode, thereby allowing exceptionally poorly written apps and database connection code which establish 5-20 connections per thread could be parked on the front of pgbouncer instead of needing to talk directly to a backend process. Use of duplicate names for prepared statements, or preparing statements outside of a transaction really ties the hands of folk who are attempting to insert additional layers in the stack to scale up applications that are effectively in an immutable and broken state of disrepair. <pipedream>A more realistic but disgusting solution would be to teach pgbouncer to intercept all prepared statements from a given client and on-the-fly rewrite the names to something that acts as a unique cookie and could be reused across backends if a given backend hasn't seen a unique prepared statement before.</pipedream></rant>

I'm trying to determine that now. I actually only included a small snippet of the uninitialized pages. It looks like there are 28 total uninitialized pages that VACUUM discovered:

[ autovacuum appears to have caught these two, however the slave wasn't turned up at this point ]
Oct  5 19:42:44 db01 postgres[37748]: [8908-1] @ 37748 0: WARNING:  relation "tbl_E" page 4263 is uninitialized --- fixing
Oct  5 19:43:03 db01 postgres[37324]: [8908-1] @ 37324 0: WARNING:  relation "tbl_B" page 2628 is uninitialized --- fixing

[ And the remaining 26 were caught via an explicitly executed `vacuumdb -az` ]
Oct  6 17:03:24 db01 postgres[73864]: [8911-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_F" page 65722 is uninitialized --- fixing
Oct  6 17:03:43 db01 postgres[73864]: [8912-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_D" page 3563 is uninitialized --- fixing
Oct  6 17:03:43 db01 postgres[73864]: [8913-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_D" page 3564 is uninitialized --- fixing
Oct  6 17:03:43 db01 postgres[73864]: [8914-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_D" page 3565 is uninitialized --- fixing
Oct  6 17:03:49 db01 postgres[73864]: [8915-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 65305 is uninitialized --- fixing
Oct  6 17:03:50 db01 postgres[73864]: [8916-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 70927 is uninitialized --- fixing
Oct  6 17:03:50 db01 postgres[73864]: [8917-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 75286 is uninitialized --- fixing
Oct  6 17:03:50 db01 postgres[73864]: [8918-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 86601 is uninitialized --- fixing
Oct  6 17:03:50 db01 postgres[73864]: [8919-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 91434 is uninitialized --- fixing
Oct  6 17:03:50 db01 postgres[73864]: [8920-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 91435 is uninitialized --- fixing
Oct  6 17:03:51 db01 postgres[73864]: [8921-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 95639 is uninitialized --- fixing
Oct  6 17:03:51 db01 postgres[73864]: [8922-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 105638 is uninitialized --- fixing
Oct  6 17:03:51 db01 postgres[73864]: [8923-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 114550 is uninitialized --- fixing
Oct  6 17:03:51 db01 postgres[73864]: [8924-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 118380 is uninitialized --- fixing
Oct  6 17:03:51 db01 postgres[73864]: [8925-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 121478 is uninitialized --- fixing
Oct  6 17:03:51 db01 postgres[73864]: [8926-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 122098 is uninitialized --- fixing
Oct  6 17:03:52 db01 postgres[73864]: [8927-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 127195 is uninitialized --- fixing
Oct  6 17:03:53 db01 postgres[73864]: [8928-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 152416 is uninitialized --- fixing
Oct  6 17:03:55 db01 postgres[73864]: [8929-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 230192 is uninitialized --- fixing
Oct  6 17:03:56 db01 postgres[73864]: [8930-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_C" page 241091 is uninitialized --- fixing
Oct  6 17:20:31 db01 postgres[73864]: [8931-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_A" page 32379133 is uninitialized --- fixing
Oct  6 17:20:32 db01 postgres[73864]: [8932-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_A" page 32385954 is uninitialized --- fixing
Oct  6 17:20:32 db01 postgres[73864]: [8933-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_A" page 32393740 is uninitialized --- fixing
Oct  6 17:20:32 db01 postgres[73864]: [8934-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_A" page 32394599 is uninitialized --- fixing
Oct  6 17:20:32 db01 postgres[73864]: [8935-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_A" page 32402510 is uninitialized --- fixing
Oct  6 17:20:33 db01 postgres[73864]: [8936-1] pgsql@dbcluster 73864 0: WARNING:  relation "tbl_A" page 32409328 is uninitialized --- fixing

>> What's odd to me is not the failure scenarios that come from a system running out of FDs (though seek(2)'ing consuming an FD seems odd), it's more that it's still possible for a master DB's VACUUM to clean up a bogus or partial page write, and have the slave crash when the WAL entry is shipped over.
> 
> It looks to me like vacuumlazy.c doesn't bother to emit a WAL record at
> all when fixing an all-zeroes heap page.  I'm not sure if that's a
> problem or not.  The first actual use of such a page ought to result in
> re-init'ing it anyway (cf XLOG_HEAP_INIT_PAGE logic in heap_insert),
> so right offhand I don't see a path from this to the slave-side failures
> you saw.  (But on the other hand I'm only firing on one cylinder today
> because of a head cold, so maybe I'm just missing it.)

(Good luck w/ the recovery)

> Do the slave-side failures correspond to pages that were reported as
> "fixed" on the master?

It doesn't appear so.

Oct  6 17:12:03 db02 postgres[15191]: [13-1] @ 15191 0: WARNING:  page 2943 of relation base/16387/20196 is uninitialized

I do see a few other odd or related messages on the slave, however:

[ from a recovery ]
Oct  6 07:14:50 db02 postgres[11972]: [12-1] @ 11972 0: LOG:  record with zero length at 9E/699842C0
Oct  6 07:14:50 db02 postgres[11972]: [13-1] @ 11972 0: LOG:  redo done at 9E/69984290
Oct  6 07:14:50 db02 postgres[11972]: [14-1] @ 11972 0: LOG:  last completed transaction was at log time 2012-10-06 07:12:11.87694+00
Oct  6 07:14:50 db02 postgres[11972]: [15-1] @ 11972 0: LOG:  checkpoint starting: end-of-recovery immediate

[ Best to let sleeping dragons lie, I shut the slave down for the night in case there was some form of feedback that could be sent back that could hang the master ]
Oct  6 07:51:32 db02 postgres[15505]: [16-1] @ 15505 0: LOG:  database system is shut down

[ I think this is from trying to connect to an ipv6 addr for localhost, but IPv6 has been disabled. Including because the verbosity level seems excessive. ]
Oct  6 16:36:06 db02 postgres[4529]: [7-1] @ 4529 0: LOG:  could not create socket for statistics collector: Protocol not supported
Oct  6 16:36:06 db02 postgres[4529]: [8-1] @ 4529 0: LOG:  trying another address for the statistics collector

[ Here the database noted the bogus page, but it was still able to start up ]
Oct  6 16:36:06 db02 postgres[4535]: [9-1] @ 4535 0: LOG:  database system was interrupted; last known up at 2012-10-06 08:07:41 UTC
Oct  6 16:36:06 db02 postgres[4535]: [10-1] @ 4535 0: LOG:  entering standby mode
Oct  6 16:36:06 db02 postgres[4535]: [11-1] @ 4535 0: LOG:  unexpected pageaddr 88/EE000000 in log file 158, segment 110, offset 0

[ Including this as a small nit: there are two lines emitted for the following log entry, which seems incorrect ]
Oct  6 16:41:31 db02 postgres[6416]: [9-1] @ 6416 0: FATAL:  could not connect to the primary server: fe_sendauth: no password supplied
Oct  6 16:41:31 db02 postgres[6416]: [9-2]     

Oct  6 16:41:36 db02 postgres[6445]: [9-1] @ 6445 0: LOG:  streaming replication successfully connected to primary
Oct  6 16:41:37 db02 postgres[4535]: [12-1] @ 4535 0: LOG:  redo starts at 9E/6E000020
Oct  6 16:41:37 db02 postgres[4535]: [13-1] @ 4535 0: LOG:  consistent recovery state reached at 9E/6E003E10
Oct  6 16:41:37 db02 postgres[4529]: [9-1] @ 4529 0: LOG:  database system is ready to accept read only connections

[ Here's what the slave sees on a crash ]
Oct  6 17:09:17 db02 postgres[15561]: [10-1] pgsql@postgres 15561 0: FATAL:  the database system is starting up
Oct  6 17:11:52 db02 postgres[15191]: [12-1] @ 15191 0: LOG:  consistent recovery state reached at 9E/E3DDE2B0
Oct  6 17:11:52 db02 postgres[15185]: [9-1] @ 15185 0: LOG:  database system is ready to accept read only connections
Oct  6 17:12:03 db02 postgres[15191]: [13-1] @ 15191 0: WARNING:  page 2943 of relation base/16387/20196 is uninitialized
Oct  6 17:12:03 db02 postgres[15191]: [13-2] @ 15191 0: CONTEXT:  xlog redo vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403
Oct  6 17:12:03 db02 postgres[15191]: [14-1] @ 15191 0: PANIC:  WAL contains references to invalid pages
Oct  6 17:12:03 db02 postgres[15191]: [14-2] @ 15191 0: CONTEXT:  xlog redo vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403
Oct  6 17:12:03 db02 postgres[15185]: [10-1] @ 15185 0: LOG:  startup process (PID 15191) was terminated by signal 6: Abort trap
[ /me scratches head for a bit ]
Oct  6 17:21:34 db02 postgres[19614]: [9-1] @ 19614 0: LOG:  database system was interrupted while in recovery at log time 2012-10-06 08:07:24 UTC
Oct  6 17:21:34 db02 postgres[19614]: [9-2] @ 19614 0: HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
Oct  6 17:21:34 db02 postgres[19614]: [10-1] @ 19614 0: LOG:  entering standby mode
Oct  6 17:21:34 db02 postgres[19614]: [11-1] @ 19614 0: LOG:  redo starts at 9E/6E000020

[ This is the point that I came up with my hypothesis and enabled full_page_writes ]
Oct  6 17:24:14 db02 postgres[19608]: [9-1] @ 19608 0: LOG:  received SIGHUP, reloading configuration files
Oct  6 17:24:14 db02 postgres[19608]: [10-1] @ 19608 0: LOG:  parameter "full_page_writes" changed to "on"
Oct  6 17:26:34 db02 postgres[19614]: [12-1] @ 19614 0: LOG:  consistent recovery state reached at 9E/E3DDE2B0
Oct  6 17:26:34 db02 postgres[19608]: [11-1] @ 19608 0: LOG:  database system is ready to accept read only connections
Oct  6 17:26:44 db02 postgres[19614]: [13-1] @ 19614 0: WARNING:  page 2943 of relation base/16387/20196 is uninitialized
Oct  6 17:26:44 db02 postgres[19614]: [13-2] @ 19614 0: CONTEXT:  xlog redo vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403
Oct  6 17:26:44 db02 postgres[19614]: [14-1] @ 19614 0: PANIC:  WAL contains references to invalid pages
Oct  6 17:26:44 db02 postgres[19614]: [14-2] @ 19614 0: CONTEXT:  xlog redo vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403
Oct  6 17:26:44 db02 postgres[19608]: [12-1] @ 19608 0: LOG:  startup process (PID 19614) was terminated by signal 6: Abort trap
Oct  6 17:26:44 db02 postgres[19608]: [13-1] @ 19608 0: LOG:  terminating any other active server processes
[ Given things were still panic'ing on the same page of data, I enabled full_page_writes on the master, performed a VACUUM, CHECKPOINT and begin resync'ing data to the slave ]

[ And here I started up the slave again ]
Oct  7 00:32:00 db02 postgres[66904]: [9-1] @ 66904 0: LOG:  database system was interrupted; last known up at 2012-10-06 17:48:10 UTC
Oct  7 00:32:00 db02 postgres[66904]: [10-1] @ 66904 0: LOG:  entering standby mode
Oct  7 00:32:00 db02 postgres[66904]: [11-1] @ 66904 0: LOG:  unexpected pageaddr 89/B9000000 in log file 159, segment 55, offset 0
Oct  7 00:32:00 db02 postgres[66906]: [9-1] [unknown]@[unknown] 66906 0: LOG:  connection received: host=[local]
Oct  7 00:32:00 db02 postgres[66906]: [10-1] pgsql@postgres 66906 0: FATAL:  the database system is starting up
Oct  7 00:32:00 db02 postgres[66905]: [9-1] @ 66905 0: LOG:  streaming replication successfully connected to primary
Oct  7 00:32:01 db02 postgres[66913]: [9-1] [unknown]@[unknown] 66913 0: LOG:  connection received: host=[local]
Oct  7 00:32:01 db02 postgres[66913]: [10-1] pgsql@postgres 66913 0: FATAL:  the database system is starting up
Oct  7 00:32:01 db02 postgres[66904]: [12-1] @ 66904 0: LOG:  redo starts at 9F/36000020
Oct  7 00:32:01 db02 postgres[66904]: [13-1] @ 66904 0: LOG:  consistent recovery state reached at 9F/36003FD8
Oct  7 00:32:02 db02 postgres[66897]: [9-1] @ 66897 0: LOG:  database system is ready to accept read only connections

[ At this point I initiated a VACUUM and rebuilt a 400GB table using pg_reorg(1) to evict all WAL files that would contain possibly poisonous data ]

And that's about all I know at this point in time. Things have been up and running successfully and without issue ever since. I'm tempted to pg_reorg(1) all of the tables in order to guarantee that all tables are 100% intact since I can't easily VACUUM FULL these tables (several of the largest, most frequently used are several TB in size). LMK if there's anything else you're interested in seeing or having outlined. Again, the master recovered gracefully, didn't require any manual intervention other than increasing kern.maxfiles, and hasn't skipped a beat once since kern.maxfiles was increased. Having the slaves crap out when the master executed a VACUUM, however, is of interest. When running the manual VACUUM, I had a window open to both db01 and db02 and db02's backend panic'ing was clearly linked to the VACUUM warnings on the db01.

In the regression testing environments, is it possible to enable a test mode that runs with full_page_writes disabled in order to push the boundaries of disabling full_page_writes? The IO bandwidth savings are substantial for small mutations, and for folks lucky enough to run on ZFS (i.e. not ufs, ext*), it'd be really cool to be able to realize those bandwidth savings. As is, it appears that standalone instances of Pg are full_page_writes safe, but that slaves are still making assumptions about full page WAL entries being the MTU during some corner cases. My observations lead me to believe that replication works fine with full_page_writes disabled until the master cleans up a zero-filled or damaged page.

For the archives (if anyone makes it this far to the bottom of the note), what I did to operationally "fix" things:

1) On the master I re-enabled full_page_writes and HUP'ed pg to reload the setting.
2) Fired off a VACUUM ANALYZE to generate some WAL data. This step isn't necessary, but I wanted to make sure that there was something to CHECKPOINT.
3) Issue a CHECKPOINT & wait until this is completed.
4) Re-sync data to the slave (via rsync(1) or whatever your tool of choice is).
5) Restarted the slave.
6) Verify that the slave is in fact replicating via pg_stat_replication

-sc

--
Sean Chittenden
sean@xxxxxxxxxxxxxx

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general