Re: It is sometimes judged to be node trouble.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

I thought about a patch more.

I made a patch to output the information of the node that canceled "LEAVE" message.
In addition, from a state of the reception of the "LEAVE" message, this patch judges the node that did not leave a cluster definitely and outputs it in log.

The modifications are as follows.

Change 1) I added the list which maintained LEAVE node to totemsrp.
Change 2) I added registration, a search, the handling of to clear LEAVE node.
Change 3) I added the output to log.
Change 4) I changed an output level of the log.
 
By this patch, the log in case of the node trouble becomes like the next.

----When a node stops by shut down and When a node dropped LEAVE message--
May 28 10:29:26 snmp1 corosync[22981]:  [TOTEM ] A new membership (192.168.10.100:11400) was formed. Members left: 3232238190
May 28 10:29:26 snmp1 corosync[22981]:  [QUORUM] This node is within the non-primary component and will NOT provide any services.
May 28 10:29:26 snmp1 corosync[22981]:  [QUORUM] Members[1]: -1062729116
May 28 10:29:26 snmp1 corosync[22981]:  [MAIN  ] Completed service synchronization, ready to provide service.

----When a node stops by trouble(reboot,panic..etc)-----------------------
May 28 10:29:55 snmp1 corosync[22981]:  [TOTEM ] A processor failed, forming new configuration.
May 28 10:29:56 snmp1 corosync[22981]:  [TOTEM ] A new membership (192.168.10.100:11408) was formed. Members left: 3232238190
May 28 10:29:56 snmp1 corosync[22981]:  [TOTEM ] Failed to receive the leave message. failed: 3232238190
May 28 10:29:56 snmp1 corosync[22981]:  [QUORUM] This node is within the non-primary component and will NOT provide any services.
May 28 10:29:56 snmp1 corosync[22981]:  [QUORUM] Members[1]: -1062729116
May 28 10:29:56 snmp1 corosync[22981]:  [MAIN  ] Completed service synchronization, ready to provide service.
--------------------------------------------------------------------------

This patch is very useful for the problem that "LEAVE" message is canceled when we constitute a cluster in mcast.
We can judge a remote node from log from a cluster by this patch.

Please review a patch, and please merge it.

 * About the making of this patch, I got advice of Christine.
 * Christine, thank you...

Best Regards,
Hideo Yamauchi.

Attachment: totemsrp.2.patch
Description: Binary data

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux