Hello,
I'm seeing the following behavior with a trigger file which is very confusing to me, I'd like to get some advice of what is the expected behavior of the trigger file with the standby.
1. setup the replication, with the standby having the following recovery.conf
# we use wal-e
restore_command = 'wal-e wal-fetch "%f" "%p"'
standby_mode = 'true'
trigger_file = '/my/path/to/trigger-file/STANDBY_OFF'
recovery_target_timeline = 'latest'
primary_conninfo = 'host=myhost port=5432 user=foo password=verysecurepassword'
2. create a trigger file while standby is having a "lag" (and replication is not streaming, but file-based log-shipping at this point)
3. looks like Postgres doesn't recognize a trigger file at all, standby keeps replaying/recovering WALs
* tried to see if Postgres is doing anything with DEBUG5 log, but it doesn't say anything about a trigger file
* also tried to restart Postgres, sending SIGUSR1, etc. to see if it helps but it just keeps replaying WALs
4. once the standby "caught up" with the leader (replayed all WALs and about to switch to the streaming replication and/or switch to the streaming replication), Postgres finally realize that there is a trigger file, and do the failover
> To trigger failover of a log-shipping standby server, run pg_ctl promote or create a trigger file with the file name and path specified by the trigger_file setting in recovery.conf.
So, I'd expect that the standby will trigger a failover as soon as we create a trigger file at step 2. However, the failover doesn't happen until step 3 above, and between step 2 and step 3 can take many hours sometimes.
I've reproduced this with Postgres 9.4 and 9.5, currently trying to reproduce with 10.
Please let me know if there is any other information I could provide.
Thanks!
Keiko Oda