NTP time steps causes cluster reconfiguration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

 

During testing, I noticed that a time step caused by ntpd caused the cluster to drop into GATHER state:

 

Jun 16 12:13:16 cp1edidbm001 ntpd[30917]: time reset -16.332117 s

Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering GATHER state from 12.

Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Creating commit token because I am the rep.

Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Saving state aru 9e high seq received 9e

Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Storing new sequence id for ring 328

Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering COMMIT state.

Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering RECOVERY state.

...

 

This is easily repeatable through setting the clock forwards by 20 seconds using /bin/date.  This probably causes comms timeouts to expire prematurely, and almost every time causes the cluster to reconfigure - luckily without affecting running services.

 

Stepping the clock backwards also causes a similar disruption, but there is a long lag between changing the time and the cluster reconfiguring:  perhaps this extends a timeout or sleep on the affected node, causing genuine timeouts on the other nodes.

 

All I am looking for is some reassurance that clock changes are not going to crash the cluster.  Is anyone able to confirm this please ?

 

regards,

Martin

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux