Hi, During
testing, I noticed that a time step caused by ntpd caused the cluster to drop
into GATHER state: Jun
16 12:13:16 cp1edidbm001 ntpd[30917]: time reset -16.332117 s Jun
16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering GATHER state from 12. Jun
16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Creating commit token because
I am the rep. Jun
16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Saving state aru 9e high seq
received 9e Jun
16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Storing new sequence id for
ring 328 Jun
16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering COMMIT state. Jun
16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering RECOVERY state. ... This
is easily repeatable through setting the clock forwards by 20 seconds using
/bin/date. This probably causes comms timeouts to expire prematurely, and
almost every time causes the cluster to reconfigure - luckily without affecting
running services. Stepping
the clock backwards also causes a similar disruption, but there is a long lag
between changing the time and the cluster reconfiguring: perhaps this
extends a timeout or sleep on the affected node, causing genuine timeouts on
the other nodes. All
I am looking for is some reassurance that clock changes are not going to crash
the cluster. Is anyone able to confirm this please ? regards, Martin |
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster