On 01/18/2014 12:16 PM, andrei.elkin@xxxxxxxxxx wrote:
Hello.
As a new member of the mail-list let me start off with
thanking a lot for this great piece of software!
Unfortunately unimplemented CPG_TYPE_SAFE seriously deter
State Machine Replication projects, database replication,
from utilizing this type of communication mechanism.
A use case that shows the danger must be well known. Yet it would be
good to describe it here, maybe I will learn what workarounds
people found.
Implementing SAFE in totemsrp is dead simple. Implementing SAFE in
totempg (the fragmentation and assembly layer) + cpg is much more
difficult. Another problem is there is no way to verify the IPC
delivery queue has actually delivered a message that is tied into the
implementation of Totem. In the past where I have said implementing
SAFE is easy, I have meant the totemsrp.c codebase. It is probably less
then 10 lines of code change. The hardest part is dealing with
configuration changes.
I'm not sure that implementing safe at that level would actually give
you what you want with cpg.
What would be handy for totem to have is a CPG that avoids the totempg
layer entirely (and limits message sizes to MTU) so that applications
could indeed utilize SAFE guarantees correctly. The apps themselves
would have to be responsible for handling fragmentation and assembly
though, which is how most modern applications of Totem work outside of
the Corosync universe.
Regards
-steve
Suppose the cluster consists of three nodes N1, N2 and N3.
By some time they all delivered (totally ordered) k-1 messages.
Suppose at that point N1 sends out its message and at once the ring splits
into N1 and N2+N3 subrings so that the N1's message gets lost for N2+N3.
Thanks to only available CPG_TYPE_AGREED delivery semantics N1 may
deliver (order) the message as m_k so the application instance on N1 will process it
to change its state, let's denote that formally as
N1.state = apply(m_k).
N2 + N3 application state would remain corresponding to m_k-1 message.
But if they took *at once* on the cluster role, which they could
'cos of being a majority of the former membership, the first message
they might deliver would make their states inconsistent with that of N1, 'cos
N2.state = apply(m_k'), m_k' != m_k.
Notice that inconsistency can't generally be mended by exchanging m_k'
and m_k if N1 will meet N2+N3 again in a common configuration.
So to summarize the description, any quorate solution for the cluster role takeover
generally can't work. For instance the database replication deems to be
unfeasible.
As to workarounds there is just one that I see:
when the totem ring configuration changes like above the cluster service
should be deferred until N1 is back.
It can't be counted as universal I think. At the same time SAFE delivery
should not really a challenging task, according to my reading of the
Totem protocol, as well as to a mail found
From sdake at redhat.com Sun Mar 11 20:18:51 2012
From: sdake at redhat.com (Steven Dake)
Date: Sun, 11 Mar 2012 13:18:51 -0700
Subject: [PATCH] drop evs service
In-Reply-To: <1331449088-28169-1-git-send-email-fdinitto@xxxxxxxxxx>
References: <1331449088-28169-1-git-send-email-fdinitto@xxxxxxxxxx>
Message-ID: <4F5D08AB.1010202@xxxxxxxxxx>
Ugh
On 03/10/2012 11:58 PM, Fabio M. Di Nitto wrote:
> From: "Fabio M. Di Nitto" <fdinitto at redhat.com>
>
> there are several reasons for this:
>
> 1) evs is only partially implemented with no plans to complete it
>
> typedef enum {
> EVS_TYPE_UNORDERED, /* not implemented */
> EVS_TYPE_FIFO, /* same as agreed */
> EVS_TYPE_AGREED,
> EVS_TYPE_SAFE /* not implemented */
> } evs_guarantee_t;
>
We should implement safe at some point - its pretty easy to do.
With best wishes,
Andrei
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss