NTP time steps causes cluster reconfiguration

"Martin Waite" <Martin.Waite@xxxxxxxxxxxx> · Fri, 16 Jul 2010 14:18:22 +0100

Hi,

During
testing, I noticed that a time step caused by ntpd caused the cluster to drop
into GATHER state:

Jun
16 12:13:16 cp1edidbm001 ntpd[30917]: time reset -16.332117 s

Jun
16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering GATHER state from 12.

Jun
16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Creating commit token because
I am the rep.

Jun
16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Saving state aru 9e high seq
received 9e

Jun
16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Storing new sequence id for
ring 328

Jun
16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering COMMIT state.

Jun
16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering RECOVERY state.

...

This
is easily repeatable through setting the clock forwards by 20 seconds using
/bin/date.  This probably causes comms timeouts to expire prematurely, and
almost every time causes the cluster to reconfigure - luckily without affecting
running services.

Stepping
the clock backwards also causes a similar disruption, but there is a long lag
between changing the time and the cluster reconfiguring:  perhaps this
extends a timeout or sleep on the affected node, causing genuine timeouts on
the other nodes.

All
I am looking for is some reassurance that clock changes are not going to crash
the cluster.  Is anyone able to confirm this please ?

regards,

Martin

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster