Master/slave replication, the master and slave both have full_page_writes disabled because the system is running on ZFS. While the master was still being setup and data was being loaded, the system ran out of open file descriptors and Pg crashed a few times until this was realized and corrected. Streaming replication was setup and was correctly passing data to the slave: repmgr@db01=# select * FROM pg_stat_replication ; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state -------+----------+---------+------------------+-----------------+-----------------+-------------+-------------------------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------ 68057 | 16384 | repmgr | walreceiver | 192.168.100.183 | | 60743 | 2012-10-06 16:41:36.963769+00 | streaming | 9E/E2C81210 | 9E/E2C81210 | 9E/E2C81210 | 9E/E2C80C68 | 0 | async (1 row) Once the host was up and running, I kicked off a manual VACUUM, at which point I see a bunch of the following errors on the master: INFO: vacuuming "schma.tbl" WARNING: relation "tbl" page 65305 is uninitialized --- fixing WARNING: relation "tbl" page 70927 is uninitialized --- fixing WARNING: relation "tbl" page 75286 is uninitialized --- fixing WARNING: relation "tbl" page 86601 is uninitialized --- fixing WARNING: relation "tbl" page 91434 is uninitialized --- fixing WARNING: relation "tbl" page 91435 is uninitialized --- fixing WARNING: relation "tbl" page 95639 is uninitialized --- fixing WARNING: relation "tbl" page 105638 is uninitialized --- fixing WARNING: relation "tbl" page 114550 is uninitialized --- fixing WARNING: relation "tbl" page 118380 is uninitialized --- fixing WARNING: relation "tbl" page 121478 is uninitialized --- fixing WARNING: relation "tbl" page 122098 is uninitialized --- fixing WARNING: relation "tbl" page 127195 is uninitialized --- fixing WARNING: relation "tbl" page 152416 is uninitialized --- fixing WARNING: relation "tbl" page 230192 is uninitialized --- fixing WARNING: relation "tbl" page 241091 is uninitialized --- fixing INFO: scanned index "tbl_pkey" to remove 447 row versions and on the slave I see: Oct 6 17:04:02 db02 postgres[4535]: [14-1] @ 4535 0: WARNING: page 2943 of relation base/16387/20196 is uninitialized Oct 6 17:04:02 db02 postgres[4535]: [14-2] @ 4535 0: CONTEXT: xlog redo vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403 Oct 6 17:04:02 db02 postgres[4535]: [15-1] @ 4535 0: PANIC: WAL contains references to invalid pages Oct 6 17:04:02 db02 postgres[4535]: [15-2] @ 4535 0: CONTEXT: xlog redo vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403 Oct 6 17:04:02 db02 postgres[4529]: [10-1] @ 4529 0: LOG: startup process (PID 4535) was terminated by signal 6: Abort trap Oct 6 17:04:02 db02 postgres[4529]: [11-1] @ 4529 0: LOG: terminating any other active server processes Oct 6 17:04:02 db02 postgres[8345]: [26-1] pgsql@repmgr 8345 0: WARNING: terminating connection because of crash of another server process Oct 6 17:04:02 db02 postgres[8345]: [26-2] pgsql@repmgr 8345 0: DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. Oct 6 17:04:02 db02 postgres[8345]: [26-3] pgsql@repmgr 8345 0: HINT: In a moment you should be able to reconnect to the database and repeat your command. Restarting the slave, I see: Oct 6 17:11:52 db02 postgres[15191]: [12-1] @ 15191 0: LOG: consistent recovery state reached at 9E/E3DDE2B0 Oct 6 17:11:52 db02 postgres[15185]: [9-1] @ 15185 0: LOG: database system is ready to accept read only connections Oct 6 17:12:03 db02 postgres[15191]: [13-1] @ 15191 0: WARNING: page 2943 of relation base/16387/20196 is uninitialized Oct 6 17:12:03 db02 postgres[15191]: [13-2] @ 15191 0: CONTEXT: xlog redo vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403 Oct 6 17:12:03 db02 postgres[15191]: [14-1] @ 15191 0: PANIC: WAL contains references to invalid pages Oct 6 17:12:03 db02 postgres[15191]: [14-2] @ 15191 0: CONTEXT: xlog redo vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403 Oct 6 17:12:03 db02 postgres[15185]: [10-1] @ 15185 0: LOG: startup process (PID 15191) was terminated by signal 6: Abort trap Oct 6 17:12:03 db02 postgres[15185]: [11-1] @ 15185 0: LOG: terminating any other active server processes At this point in time, I'm suspecting full_page_writes=off is the culprit and it's violating logic in several places. I haven't tracked things down further, but I thought I'd at least bring this to people's attention so that the appropriate warnings can be put in place. The slave can not be recovered with the corrupted WAL entries, however, and they need to be resynced with full_page_writes enabled (at least that's my assumption at present): Oct 6 17:26:34 db02 postgres[19614]: [12-1] @ 19614 0: LOG: consistent recovery state reached at 9E/E3DDE2B0 Oct 6 17:26:34 db02 postgres[19608]: [11-1] @ 19608 0: LOG: database system is ready to accept read only connections Oct 6 17:26:44 db02 postgres[19614]: [13-1] @ 19614 0: WARNING: page 2943 of relation base/16387/20196 is uninitialized Oct 6 17:26:44 db02 postgres[19614]: [13-2] @ 19614 0: CONTEXT: xlog redo vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403 Oct 6 17:26:44 db02 postgres[19614]: [14-1] @ 19614 0: PANIC: WAL contains references to invalid pages Oct 6 17:26:44 db02 postgres[19614]: [14-2] @ 19614 0: CONTEXT: xlog redo vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403 Oct 6 17:26:44 db02 postgres[19608]: [12-1] @ 19608 0: LOG: startup process (PID 19614) was terminated by signal 6: Abort trap Oct 6 17:26:44 db02 postgres[19608]: [13-1] @ 19608 0: LOG: terminating any other active server processes Cheers. I'll report back if I dig anything else up. -sc [ plz CC as I'm not subscribed to the list ]] -- Sean Chittenden sean@xxxxxxxxxxxxxx -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general