Re: NTP time steps causes cluster reconfiguration

Steven Dake <sdake@xxxxxxxxxx> · Fri, 16 Jul 2010 13:36:32 -0700

On 07/16/2010 06:18 AM, Martin Waite wrote:
Hi,

During testing, I noticed that a time step caused by ntpd caused the
cluster to drop into GATHER state:

Jun 16 12:13:16 cp1edidbm001 ntpd[30917]: time reset -16.332117 s

Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering GATHER
state from 12.

Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Creating commit
token because I am the rep.

Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Saving state aru 9e
high seq received 9e

Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Storing new
sequence id for ring 328

Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering COMMIT state.

Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering RECOVERY
state.

...

This is easily repeatable through setting the clock forwards by 20
seconds using /bin/date. This probably causes comms timeouts to expire
prematurely, and almost every time causes the cluster to reconfigure -
luckily without affecting running services.

Stepping the clock backwards also causes a similar disruption, but there
is a long lag between changing the time and the cluster reconfiguring:
perhaps this extends a timeout or sleep on the affected node, causing
genuine timeouts on the other nodes.

All I am looking for is some reassurance that clock changes are not
going to crash the cluster. Is anyone able to confirm this please ?

regards,

Martin

Martin,

NTP integration is not available with openais.  That feature has been 
introduced into corosync.

Regards
-steve

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster