Re: how is failure detection achieved in Corosync?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



There is a "leader" node, but it's arbitrarily chosen and can change whenever the membership changes.

As for cloud stacks, I can offer no advice as I use my own setup (the exact one in that tutorial I linked), so I have little experience with those.

Cheers

digimer

On 04/11/2013 07:54 AM, Alejandro Z. Tomsic wrote:
Hello digimer,

Thank you for your reply.
One further thing is not clear for me: when the token is going around the cluster, is there a leader that checks (and knows) where the token is (or should be)?
Further more, do you know which open cloud stacks (like OpenNebula, OpenStack, Eucalypus Cloudstack) use (or can use) corosync?
best,

Alejandro



On 10/04/2013, at 16:30, Digimer <lists@xxxxxxxxxx> wrote:

Hi Alejandro,

  I cover how corosync does this as part of a discussion on fencing in Red Hat clusters. It covers, as best as I could describe, how failure detection works;

https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing

  Hopefully that helps shed some light for you. :)

digimer

On 04/10/2013 06:36 AM, Alejandro Z. Tomsic wrote:
I would like to know how the process of failure detection is achieved in
Corosync (if any). I would like to know about the implementation
details, i.e. if its done at physical, virtual machine or at application
level. Does Corosync use any known failure detection mechanisms? e.g.
[1][2][3][4] or any other. Where can I find this information?

Thank you in advance.

Alejandro




[1] M.Bertier,O.Marin,andP.Sens.Implementation and performance
evaluation of an adaptable failure detector. In International Conference
on Dependable Systems and Networks (DSN), pages 354–363, June 2002.

[2] W. Chen, S. Toueg, and M. K. Aguilera. On the quality of service of
failure detectors. IEEE Transactions on Computers, 51(5):561–580, May 2002.

[3] N. Hayashibara, X. De ́fago, R. Yared, and T. Katayama. The φ accrual
failure detector. In IEEE Symposium on Reliable Distributed Systems
(SRDS), pages 66–78, Oct. 2004.

[4] Joshua B. Leners, Hao Wu, Wei-Lun Hung, Marcos K. Aguilera, and
Michael Walfish. 2011. Detecting failures in distributed systems with
the Falcon spy network. In Proceedings of the Twenty-Third ACM Symposium
on Operating Systems Principles (SOSP '11). ACM, New York, NY, USA,
279-294.


_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss



--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?



--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss





[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux