Looking at the startup process:
postgres 16749 4.1 6.7 17855104 8914544 ? Ss 18:36 0:44 postgres: startup process recovering 0000000800005B1C00000030
Then a few seconds later:
postgres 16749 4.2 7.0 17855104 9294172 ? Ss 18:36 0:47 postgres: startup process recovering 0000000800005B1C00000047
It's replaying logs from the master, but it's always a few behind, so startup never finishes. Here's a demonstration:
# while :; do echo $(ls data/pg_xlog/ | grep -n $(ps aux | egrep "startup process" | awk '{print $15}')) $(ls data/pg_xlog/ | wc -l); sleep 1; done
# current replay location # number of WALs in pg_xlog
1655:0000000800005B1C00000064 1659
1656:0000000800005B1C00000065 1660
1658:0000000800005B1C00000067 1661
1659:0000000800005B1C00000068 1662
1660:0000000800005B1C00000069 1663
Generally this works itself out if I wait (sometimes a really long time). Is there a configuration option that allows a warm standby to start without having fully replayed the logs from the master?
* Note: wal_keep_segments is set to 8192 on these servers, which have large disks, to allow for recovery within a couple of hours of a failover without resorting to restoring from archive
* This is specifically an issue for pgpool recovery, which fails if a standby can't start within (by default) 300 seconds. Open to toggling that param if there's no way around this.