Re: What is the reason which the node in which failure has not occurred carries out "lost"?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/02/14 05:27 AM, yusuke iida wrote:
HI, Digimer

In this test environment, since stonith was set to disable, the node
which lost once was join again.

It may or may not be related, but _please_ setup stonith, even if it's a test environment. The cluster software treats all clusters as "production", and thus functions like stonith are required for predictably operation.

token set up default "1000 ms."
I do not know whether this value is the optimal with the composition
of 14 nodes.
What kind of value should the number of nodes and the value of token
just set up, or is there any guideline?

With so many nodes, that is probably a sensible value.

It seems that they are join(ed) from the following logs immediately
after some nodes lost.

Jan 31 14:27:41 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43448) was formed. Members joined: -1062705788
-1062705787 -1062705786 -1062705785 -1062705784 -1062705783
-1062705782 left: -1062705788 -1062705787 -1062705786 -1062705785
-1062705784 -1062705783 -1062705782 -1062705777 -1062705775
Jan 31 14:27:42 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43592) was formed. Members joined: -1062705788
-1062705787 -1062705786 -1062705785 -1062705784 -1062705783
-1062705782 left: -1062705788 -1062705787 -1062705786 -1062705785
-1062705784 -1062705783 -1062705782 -1062705776
Jan 31 14:27:42 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43596) was formed. Members joined: -1062705776
-1062705775
Jan 31 14:27:42 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43604) was formed. Members joined: -1062705788
-1062705775 left: -1062705788 -1062705775
Jan 31 14:27:42 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43612) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:42 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43620) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:42 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43628) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:42 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43636) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:43 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43644) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:43 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43652) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:43 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43660) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:43 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43668) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:43 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43676) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:43 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43684) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:43 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43692) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:43 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43700) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:44 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43704) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:45 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43712) was formed. Members joined: -1062705777

Regards,
Yusuke

Without looking too closely, my first guess is that corosync on the failed/recovered node is out of sync with the rest of the cluster. Does it rejoin properly are a reboot? If so, stonith would prevent this problem.

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss




[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux