Re: What is the reason which the node in which failure has not occurred carries out "lost"?

yusuke iida <yusk.iida@xxxxxxxxx> · Wed, 12 Feb 2014 19:27:49 +0900

HI, Digimer

In this test environment, since stonith was set to disable, the node
which lost once was join again.

token set up default "1000 ms."
I do not know whether this value is the optimal with the composition
of 14 nodes.
What kind of value should the number of nodes and the value of token
just set up, or is there any guideline?

It seems that they are join(ed) from the following logs immediately
after some nodes lost.

Jan 31 14:27:41 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43448) was formed. Members joined: -1062705788
-1062705787 -1062705786 -1062705785 -1062705784 -1062705783
-1062705782 left: -1062705788 -1062705787 -1062705786 -1062705785
-1062705784 -1062705783 -1062705782 -1062705777 -1062705775
Jan 31 14:27:42 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43592) was formed. Members joined: -1062705788
-1062705787 -1062705786 -1062705785 -1062705784 -1062705783
-1062705782 left: -1062705788 -1062705787 -1062705786 -1062705785
-1062705784 -1062705783 -1062705782 -1062705776
Jan 31 14:27:42 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43596) was formed. Members joined: -1062705776
-1062705775
Jan 31 14:27:42 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43604) was formed. Members joined: -1062705788
-1062705775 left: -1062705788 -1062705775
Jan 31 14:27:42 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43612) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:42 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43620) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:42 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43628) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:42 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43636) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:43 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43644) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:43 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43652) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:43 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43660) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:43 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43668) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:43 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43676) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:43 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43684) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:43 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43692) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:43 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43700) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:44 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43704) was formed. Members joined: -1062705788 left:
-1062705788
Jan 31 14:27:45 vm11 corosync[2421]:  [TOTEM ] A new membership
(192.168.101.132:43712) was formed. Members joined: -1062705777

Regards,
Yusuke

2014-02-12 13:53 GMT+09:00 Digimer <lists@xxxxxxxxxx>:
> On 11/02/14 11:40 PM, yusuke iida wrote:
>>
>> Hi, all
>>
>> Since a reply was not obtained in ML of Pacemaker, please let me ask a
>> question also here.
>>
>> I measure the performance of Pacemaker in the following combinations.
>> Pacemaker-1.1.11.rc1
>> libqb-0.16.0
>> corosync-2.3.2
>>
>> All nodes are KVM virtual machines.
>>
>>   stopped the node of vm01 compulsorily from the inside, after starting 14
>> nodes.
>> "virsh destroy vm01" was used for the stop.
>> Then, in addition to the compulsorily stopped node, other nodes are
>> separated from a cluster.
>>
>> The log of "Retransmit List:" is then outputted in large quantities
>> from corosync.
>>
>> What is the reason which the node in which failure has not occurred
>> carries out "lost"?
>>
>> Please advise, if there is a problem in a setup in something.
>>
>> I attached the report when the problem occurred.
>>
>> https://drive.google.com/file/d/0BwMFJItoO-fVMkFWWWlQQldsSFU/edit?usp=sharing
>>
>> Regards,
>> Yusuke
>
>
> Was the lost node fenced (stonithed) successfully? Did you chance the totem
> token timeouts or maximum number of allowed lost token? Was there anything
> interesting in the log file(s) of the remaining healthy node(s)?
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?

-- 
----------------------------------------
METRO SYSTEMS CO., LTD

Yusuke Iida
Mail: yusk.iida@xxxxxxxxx
----------------------------------------
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss