Hey,I think I know what the problem is,after the first failover when I clone the old master to be standby with the 'repmgr standby clone' command it seems that nothing updates the repl_nodes table with the new standby in my cluster so on the next failover the repmgrd is failed to find a new upcoming standby to failover..this issue is confirmed after that I manually updated the repl_nodes table after the clone so that the old master is now a standby database.now my question is:Where does is suppose to happen that after I issue the 'repmgr standby clone' the repl_nodes should be updated too about the new standby server?Best regards,Aviel Buskila2015-08-16 12:11 GMT+03:00 Aviel Buskila <aviel33@xxxxxxxxx>:hey,
I have tried to set the configuration all over again, now the status of 'repl_nodes' before the failover is:
id | type | upstream_node_id | cluster | name | conninfo | priority | active
----+---------+---------------+------------------------------------------------------------+----------+---------
1 | master | | cluster_name |node1| host=node1 dbname=repmgr port=5432 user=repmgr | 100 | t
2 | standby| 1 | cluster_name |node2| host=node2 dbname=repmgr port=5432 user=repmgr | 100 | t3 | witness| | cluster_name |node3| host=node3 dbname=repmgr port=5499 user=repmgr | 100 | t
repmgr is started on node2 and node3 (standby and witness) now when I kill postgresmaster process I can see in the
repmgrd log the following messages:
[WARNING] connection to master has been lost, trying to recover... 60 seconds before failover decision
[WARNING] connection to master has been lost, trying to recover... 50 seconds before failover decision
[WARNING] connection to master has been lost, trying to recover... 40 seconds before failover decision
[WARNING] connection to master has been lost, trying to recover... 30 seconds before failover decision
[WARNING] connection to master has been lost, trying to recover... 20 seconds before failover decision
[WARNING] connection to master has been lost, trying to recover... 10 seconds before failover decision
and than when it tried to elect node2 to be promoted it shows the following messages:
[DEBUG] connecting to: 'host=node2 user=repmgr dbname=repmgr fallback_application_name='repmgr''
[WARNING] unable to defermmine a valid master server; waiting 10 seconds to retry...
[ERROR] unable to determine a valid master node, terminating...
[INFO] repmgrd terminating..
what am I doing wrong?
El 14/08/15 a las 04:14, Aviel Buskila escribió:
> Hey,
> yes I did .. and still it wont fail back..
Can you send over the output of "repmgr cluster show" before and after
the failover process?
The output of SELECT * FROM repmgr_schema.repl_nodes; after the failover
(you need to change repmgr_schema with what you have configured).
Also, which version of repmgr are you running?
> 2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen <jony.cohenjo@xxxxxxxxx>:
>
>> Hi, did you make the old master follow the new one using repmgr?
>>
>> It doesn't update itself automatically...
>> From the looks of it repmgr thinks you have 2 masters - the old one
>> offline and the new one online.
Regards,
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi,
The clone command just clones the data from node2 to node1, you need to also register it with the `force` option to override the old record. (as if you're building a new replica node...)
see:
Regards,
- Jony
On Sun, Aug 16, 2015 at 3:19 PM, Aviel Buskila <aviel33@xxxxxxxxx> wrote: