On Tue, 13 Aug 2019 16:38:44 +0530 Shital A <brightuser2019@xxxxxxxxx> wrote: > On Tue, 13 Aug 2019, 11:50 Shital A, <brightuser2019@xxxxxxxxx> wrote: > > > On Mon, 12 Aug 2019, 18:54 Adrien Nayrat, <adrien.nayrat@xxxxxxxxxxxx> > > wrote: > > > >> On 8/12/19 2:57 PM, Shital A wrote: > >> > Postgres version : 9.6 > >> > OS:Rhel 7.6 > >> > > >> > We are working on HA setup for postgres cluster of two nodes in > >> > active-passive mode. > >> > > >> > Installed: > >> > Pacemaker 1.1.19 > >> > Corosync 2.4.3 > >> > > >> > The pacemaker agent with this installation doesn't support automatic > >> > failback. > >> > What I mean by that is explained below: > >> > 1. Cluster is setup like A - B with A as master. > >> > 2. Kill services on A, node B will come up as master. > >> > 3. node A is ready to join the cluster, we have to delete the lock file > >> > it creates on any one of the node and execute the cleanup command to > >> > get the node back as standby > >> > > >> > Step 3 is manual so HA is not achieved in real sense. > >> > > >> > Please help to check: > >> > 1. Is there any version of the resouce agent which supports automatic > >> > failback? > >> > To avoid generation of lock file and deleting it. > >> > > >> > 2. If there is no such support, what checks should be added in pgsql RA > >> > to achieve > >> > >> Which RA did you use? AFAIK there is two RA : > >> - pgsql : > >> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/pgsql > >> - PAF : http://clusterlabs.github.io/PAF/ (Which is different from pgsql > >> RA : > >> http://clusterlabs.github.io/PAF/FAQ.html#why-new-ra-for-postgresql) > >> > >> If I am right, PAF should handle the case when there no need to perform > >> pgrewind > >> (if the old master is not too advanced in the transaction log). PAF only make sure controlled failover (aka switchover) performs safely so the old master hook back as a standby correctly. Should a failure occurs with an automatic failover, PAF has no dark magic to automagicaly failoback the failing master as a standby. > > We are using the pgsql RA installed with pacemaker. > > > > Is PAF recommended over pgsql? Yes, as much as the historical RA. > > Has anyone changed pgsql to handle the lock file and other cases that > > might not have been handled? I don't know. However, I do think it would be either presumptuous to climb this road today. > Looks like the previous email bounced, received undelivered message. > Sending again. We are using the pgsql RA installed with pacemaker and seems > like it prevents failback by creating a lock file. We want to implement > auto failback and need pointers on how to implement. The lock file is here because PgSQL has no safety belt against data corruption when an old master hook on a new one while its LSN was further away from the timeline fork. You must take care of this by yourself. Moreover, should a failover occurs, you should probably check WHY it occurred and fix things before performing a blindly failback of a potential wounded instance. A wounded instance can hurt your cluster in many different and inconfortable ways. Last, as Adrien wrote, you can manually use pg_rewind to revert the old master to the timeline fork, as far as you enabled checksums or 'wal_log_hints=on' and kept enough WAL around. Another option is just to make sure you are able to build a new standby as was fast as possible. But remember: if your data is important, use quorum, fencing and watchdog. Good luck.