Hi,
Say, with 9.6.2, a hot_standby fails to connect to a replication slot:
FATAL: could not start WAL streaming: ERROR: replication slot "test3" does not exist
or
FATAL: could not connect to the primary server: FATAL: the database system is starting up
Is there a way to reduce the time it takes until the next attempt? I assumed, wrongly I think, that this would be wal_retrieve_retry_interval, but it seems that it won't make a difference. I tried setting it to 3s, but it seems to take 15s still. Here are two log samples:
1.
< 2017-05-27 20:12:12.137 CEST > LOG: consistent recovery state reached at 0/8000328
< 2017-05-27 20:12:12.137 CEST > LOG: invalid record length at 0/8000328: wanted 24, got 0
< 2017-05-27 20:12:12.137 CEST > LOG: database system is ready to accept read only connections
< 2017-05-27 20:12:12.142 CEST > FATAL: could not connect to the primary server: FATAL: the database system is starting up
< 2017-05-27 20:12:27.208 CEST > LOG: fetching timeline history file for timeline 4 from primary server
< 2017-05-27 20:12:27.212 CEST > LOG: started streaming WAL from primary at 0/8000000 on timeline 3
< 2017-05-27 20:12:27.212 CEST > LOG: replication terminated by primary server
< 2017-05-27 20:12:27.212 CEST > DETAIL: End of WAL reached on timeline 3 at 0/8000328.
2.
< 2017-05-26 19:17:48.462 CEST > LOG: database system was shut down in recovery at 2017-05-26 19:17:20 CEST
< 2017-05-26 19:17:48.462 CEST > LOG: entering standby mode
< 2017-05-26 19:17:48.463 CEST > LOG: consistent recovery state reached at 0/8000398
< 2017-05-26 19:17:48.463 CEST > LOG: invalid record length at 0/8000398: wanted 24, got 0
< 2017-05-26 19:17:48.464 CEST > LOG: database system is ready to accept read only connections
< 2017-05-26 19:17:48.470 CEST > FATAL: could not start WAL streaming: ERROR: replication slot "test3" does not exist
< 2017-05-26 19:18:03.495 CEST > LOG: fetching timeline history file for timeline 4 from primary server
< 2017-05-26 19:18:03.498 CEST > LOG: started streaming WAL from primary at 0/8000000 on timeline 3
< 2017-05-26 19:18:03.498 CEST > LOG: replication terminated by primary server
< 2017-05-26 19:18:03.498 CEST > DETAIL: End of WAL reached on timeline 3 at 0/8000398.
-bash-4.2$ psql
psql (9.6.2)
Type "help" for help.
postgres=# show wal_retrieve_retry_interval;
wal_retrieve_retry_interval
-----------------------------
3s
(1 row)
Thanks
Ludovic