That fixed it, thanks Scott. I wish there had been more detail about that parameter on the postgresql website (maybe I could add it.) I commented out that parameter and successfully restarted the server. This time it remained as a standby.
Thanks again,
Jay
On Thu, Oct 8, 2015 at 10:04 AM, Scott Mead <scottm@xxxxxxxxxxx> wrote:
On Thu, Oct 8, 2015 at 10:59 AM, John Scalia <jayknowsunix@xxxxxxxxx> wrote:Yep, they're both in different directories and on different ports. Allow me to note, that I have both handbuilt this file from the sample provided in /usr/pgsql-9.4/share and used pg_basebackup's -R option to automatically generate it. Doesn't seem to make a difference. Here's the recovery.conf file I'm using on the standby.restore_command = 'cp /var/lib/pgsql/9.4/share/%f %p' # e.g. 'cp /mnt/server/archivedir/%f %p'archive_cleanup_command = '/usr/pgsql-9.4/bin/pg_archivecleanup /var/lib/pgsql/9.4/share %r'#recovery_end_command = ''#recovery_target_name = '' # e.g. 'daily backup 2011-01-26'#recovery_target_time = '' # e.g. '2004-07-14 22:39:00 EST'#recovery_target_xid = ''#recovery_target_inclusive = truerecovery_target = 'immediate'I believe you should omit recovery_target. From http://www.postgresql.org/docs/9.4/static/recovery-target-settings.htmlrecovery_target = 'immediate' This parameter specifies that recovery should end as soon as a consistent state is reached, i.e. as early as possible. When restoring from an online backup, this means the point where taking the backup ended.
--Scott MeadSr. ArchitectOpenSCGPostgreSQL, Java & Linux Expertspause_at_recovery_target = falsestandby_mode = onprimary_conninfo = 'host=10.32.10.175 port=5432 user=replicator' # e.g. 'host=localhost port=5432'primary_slot_name = 'replicant'trigger_file = '/var/lib/pgsql/9.4/share/trigger_file'#recovery_min_apply_delay = 0I have also gone through the master's postgresql.conf file several times looking for problems, and I've verified the replication slot exists.Thanks for your assistance,JayOn Thu, Oct 8, 2015 at 9:51 AM, Keith Fiske <keith@xxxxxxxxxx> wrote:On Thu, Oct 8, 2015 at 9:24 AM, John Scalia <jayknowsunix@xxxxxxxxx> wrote:Hi all,We have an ongoing discussion here about retiring slony and implementing streaming synchronous replication. As part of this effort, I've been trying to build a small cluster with one primary and one standby. Yes, I know about the potential problems with a single standby, but this is only to provide configuration parameters to my sysadmins.What I've tried is building the two servers on a single sandbox server. The primary has been running correctly for some time. The standby starts, but quickly turns itself into a primary without any trigger file being present. My process is:1) build the new standby in a different directory with pg_basebackup2) edit the standby's postgresql.conf so the standby has a different port.3) start the standby.4) immediately after starting the standby, I see a recovery.done file in that PGDATA dir.Here's part of the debug2 level log file from that standby, but I'm not seeing why this standby is becoming a primary. Only that it is:[2015-10-08 07:18:19.448 CDT] @: LOG: 00000: database system is ready to accept read only connections[2015-10-08 07:18:19.448 CDT] @: LOCATION: sigusr1_handler, postmaster.c:4857cp: cannot stat `/var/lib/pgsql/9.4/share/00000002.history': No such file or directory[2015-10-08 07:18:19.477 CDT] @: DEBUG: 00000: could not restore file "00000002.history" from archive: child process exited with exit code 1[2015-10-08 07:18:19.477 CDT] @: LOCATION: RestoreArchivedFile, xlogarchive.c:304[2015-10-08 07:18:19.477 CDT] @: LOG: 00000: selected new timeline ID: 2[2015-10-08 07:18:19.477 CDT] @: LOCATION: StartupXLOG, xlog.c:7107cp: cannot stat `/var/lib/pgsql/9.4/share/00000001.history': No such file or directory[2015-10-08 07:18:19.485 CDT] @: DEBUG: 00000: could not restore file "00000001.history" from archive: child process exited with exit code 1[2015-10-08 07:18:19.485 CDT] @: LOCATION: RestoreArchivedFile, xlogarchive.c:304[2015-10-08 07:18:19.792 CDT] @: LOG: 00000: archive recovery complete[2015-10-08 07:18:19.792 CDT] @: LOCATION: exitArchiveRecovery, xlog.c:5417[2015-10-08 07:18:19.798 CDT] @: DEBUG: 00000: performing replication slot checkpoint[2015-10-08 07:18:19.798 CDT] @: LOCATION: CheckPointReplicationSlots, slot.c:794[2015-10-08 07:18:19.814 CDT] @: DEBUG: 00000: attempting to remove WAL segments older than log file 0000000000000006000000D8[2015-10-08 07:18:19.814 CDT] @: LOCATION: RemoveOldXlogFiles, xlog.c:3775[2015-10-08 07:18:19.814 CDT] @: DEBUG: 00000: SlruScanDirectory invoking callback on pg_multixact/offsets/0000[2015-10-08 07:18:19.814 CDT] @: LOCATION: SlruScanDirectory, slru.c:1307[2015-10-08 07:18:19.815 CDT] @: DEBUG: 00000: SlruScanDirectory invoking callback on pg_multixact/members/0000[2015-10-08 07:18:19.815 CDT] @: LOCATION: SlruScanDirectory, slru.c:1307[2015-10-08 07:18:19.815 CDT] @: DEBUG: 00000: SlruScanDirectory invoking callback on pg_multixact/offsets/0000[2015-10-08 07:18:19.815 CDT] @: LOCATION: SlruScanDirectory, slru.c:1307[2015-10-08 07:18:19.893 CDT] @: DEBUG: 00000: attempting to remove WAL segments newer than log file 0000000200000006000000E1[2015-10-08 07:18:19.893 CDT] @: LOCATION: RemoveNonParentXlogFiles, xlog.c:5458[2015-10-08 07:18:19.899 CDT] @: DEBUG: 00000: oldest MultiXactId member is at offset 1[2015-10-08 07:18:19.899 CDT] @: LOCATION: SetOffsetVacuumLimit, multixact.c:2677[2015-10-08 07:18:19.899 CDT] @: LOG: 00000: MultiXact member wraparound protections are now enabled[2015-10-08 07:18:19.899 CDT] @: LOCATION: DetermineSafeOldestOffset, multixact.c:2587[2015-10-08 07:18:19.899 CDT] @: DEBUG: 00000: MultiXact member stop limit is now 4294914944 based on MultiXact 1[2015-10-08 07:18:19.899 CDT] @: LOCATION: DetermineSafeOldestOffset, multixact.c:2590[2015-10-08 07:18:19.899 CDT] @: DEBUG: 00000: release all standby locks[2015-10-08 07:18:19.899 CDT] @: LOCATION: StandbyReleaseAllLocks, standby.c:666[2015-10-08 07:18:19.903 CDT] @: LOG: 00000: starting background worker process "powa"[2015-10-08 07:18:19.904 CDT] @: LOCATION: do_start_bgworker, postmaster.c:5412[2015-10-08 07:18:19.905 CDT] @: LOG: 00000: database system is ready to accept connectionsThe one thing that's a little weird on this configuration is that both the primary and the standby are on the same system, and I've never done a configuration this way before. Usually, these are on different systems, but this was done just to test some configuration parameters. Can a standby and a primary not exist together on the same system? In any event, there is no trigger file as specified in the recovery.conf on this system. So, why is promotion occuring?--JayLong as they're on different ports and using different data directories, you should be fine.
Can you share the contents of your recovery.conf file?
--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com