There is a "leader" node, but it's arbitrarily chosen and can change
whenever the membership changes.
As for cloud stacks, I can offer no advice as I use my own setup (the
exact one in that tutorial I linked), so I have little experience with
those.
Cheers
digimer
On 04/11/2013 07:54 AM, Alejandro Z. Tomsic wrote:
Hello digimer,
Thank you for your reply.
One further thing is not clear for me: when the token is going around the cluster, is there a leader that checks (and knows) where the token is (or should be)?
Further more, do you know which open cloud stacks (like OpenNebula, OpenStack, Eucalypus Cloudstack) use (or can use) corosync?
best,
Alejandro
On 10/04/2013, at 16:30, Digimer <lists@xxxxxxxxxx> wrote:
Hi Alejandro,
I cover how corosync does this as part of a discussion on fencing in Red Hat clusters. It covers, as best as I could describe, how failure detection works;
https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing
Hopefully that helps shed some light for you. :)
digimer
On 04/10/2013 06:36 AM, Alejandro Z. Tomsic wrote:
I would like to know how the process of failure detection is achieved in
Corosync (if any). I would like to know about the implementation
details, i.e. if its done at physical, virtual machine or at application
level. Does Corosync use any known failure detection mechanisms? e.g.
[1][2][3][4] or any other. Where can I find this information?
Thank you in advance.
Alejandro
[1] M.Bertier,O.Marin,andP.Sens.Implementation and performance
evaluation of an adaptable failure detector. In International Conference
on Dependable Systems and Networks (DSN), pages 354–363, June 2002.
[2] W. Chen, S. Toueg, and M. K. Aguilera. On the quality of service of
failure detectors. IEEE Transactions on Computers, 51(5):561–580, May 2002.
[3] N. Hayashibara, X. De ́fago, R. Yared, and T. Katayama. The φ accrual
failure detector. In IEEE Symposium on Reliable Distributed Systems
(SRDS), pages 66–78, Oct. 2004.
[4] Joshua B. Leners, Hao Wu, Wei-Lun Hung, Marcos K. Aguilera, and
Michael Walfish. 2011. Detecting failures in distributed systems with
the Falcon spy network. In Proceedings of the Twenty-Third ACM Symposium
on Operating Systems Principles (SOSP '11). ACM, New York, NY, USA,
279-294.
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss