Re: how is failure detection achieved in Corosync?

Digimer <lists@xxxxxxxxxx> · Thu, 11 Apr 2013 14:45:39 -0400

There is a "leader" node, but it's arbitrarily chosen and can change 
whenever the membership changes.

As for cloud stacks, I can offer no advice as I use my own setup (the 
exact one in that tutorial I linked), so I have little experience with 
those.

Cheers

digimer

On 04/11/2013 07:54 AM, Alejandro Z. Tomsic wrote:
Hello digimer,

Thank you for your reply.
One further thing is not clear for me: when the token is going around the cluster, is there a leader that checks (and knows) where the token is (or should be)?
Further more, do you know which open cloud stacks (like OpenNebula, OpenStack, Eucalypus Cloudstack) use (or can use) corosync?
best,

Alejandro

On 10/04/2013, at 16:30, Digimer <lists@xxxxxxxxxx> wrote:

Hi Alejandro,

  I cover how corosync does this as part of a discussion on fencing in Red Hat clusters. It covers, as best as I could describe, how failure detection works;

https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing

  Hopefully that helps shed some light for you. :)

digimer

On 04/10/2013 06:36 AM, Alejandro Z. Tomsic wrote:
I would like to know how the process of failure detection is achieved in
Corosync (if any). I would like to know about the implementation
details, i.e. if its done at physical, virtual machine or at application
level. Does Corosync use any known failure detection mechanisms? e.g.
[1][2][3][4] or any other. Where can I find this information?

Thank you in advance.

Alejandro

[1] M.Bertier,O.Marin,andP.Sens.Implementation and performance
evaluation of an adaptable failure detector. In International Conference
on Dependable Systems and Networks (DSN), pages 354–363, June 2002.

[2] W. Chen, S. Toueg, and M. K. Aguilera. On the quality of service of
failure detectors. IEEE Transactions on Computers, 51(5):561–580, May 2002.

[3] N. Hayashibara, X. De ́fago, R. Yared, and T. Katayama. The φ accrual
failure detector. In IEEE Symposium on Reliable Distributed Systems
(SRDS), pages 66–78, Oct. 2004.

[4] Joshua B. Leners, Hao Wu, Wei-Lun Hung, Marcos K. Aguilera, and
Michael Walfish. 2011. Detecting failures in distributed systems with
the Falcon spy network. In Proceedings of the Twenty-Third ACM Symposium
on Operating Systems Principles (SOSP '11). ACM, New York, NY, USA,
279-294.

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss