Re: SAFE delivery feature request

andrei.elkin@xxxxxxxxxx · Tue, 28 Jan 2014 13:43:00 +0200

Hello, Steven.

> On 01/18/2014 12:16 PM, andrei.elkin@xxxxxxxxxx wrote:
>> Hello.
>>
>> As a new member of the mail-list let me start off with
>> thanking a lot for this great piece of software!
>>
>> Unfortunately unimplemented CPG_TYPE_SAFE seriously deter
>> State Machine Replication projects, database replication,
>> from utilizing this type of communication mechanism.
>> A use case that shows the danger must be well known. Yet it would be
>> good to describe it here, maybe I will learn what workarounds
>> people found.
> Implementing SAFE in totemsrp is dead simple.  Implementing SAFE in
> totempg (the fragmentation and assembly layer) + cpg is much more
> difficult.  Another problem is there is no way to verify the IPC
> delivery queue has actually delivered a message that is tied into the
> implementation of Totem. 

Thanks for explaining! 

Indeed I thought on the easiness as a theorist

> In the past where I have said implementing
> SAFE is easy, I have meant the totemsrp.c codebase.  It is probably
> less then 10 lines of code change.  The hardest part is dealing with
> configuration changes.

awaiting about your lines estimate.

>
> I'm not sure that implementing safe at that level would actually give
> you what you want with cpg.
>
> What would be handy for totem to have is a CPG that avoids the totempg
> layer entirely (and limits message sizes to MTU) so that applications
> could indeed utilize SAFE guarantees correctly.  The apps themselves
> would have to be responsible for handling fragmentation and assembly
> though, which is how most modern applications of Totem work outside of
> the Corosync universe.

Well, handing fragmentation over to the application is kind of the last
resort in my eyes.
I have not looked in your fragmentation layer details, but would it work
to deliver assembled message as SAFE if the safe prefix delivery (as defined in GCS
specification by Chockler, Keidar and Vitenberg 2001) is made?
That is when SAFE delivery is requested, more to the message fragments
Totem delivers the safety mark of the last fragment.
The fragmentation layer would be waiting for the mark.

The safety prefix delivery would cost those 10 lines. Whether changes in the
fragmentation layer are feasible is unknown to me.
Likewise to CPG which would not not deliver the SAFE message without the
safe prefix either.

I would love to hear your reply!

Andrei

>
> Regards
> -steve
>
>>    Suppose the cluster consists of three nodes N1, N2 and N3.
>>    By some time they all delivered (totally ordered) k-1 messages.
>>    Suppose at that point N1 sends out its message and at once the ring splits
>>    into N1 and N2+N3 subrings so that the N1's message gets lost for N2+N3.
>>    Thanks to only available CPG_TYPE_AGREED delivery semantics N1 may
>>    deliver (order) the message as m_k so the application instance on N1 will process it
>>    to change its state, let's denote that formally as
>>
>>        N1.state = apply(m_k).
>>
>>    N2 + N3 application state would remain corresponding to m_k-1 message.
>>    But if they took *at once* on the cluster role, which they could
>>    'cos of being a majority of the former membership, the first message
>>    they might deliver would make their states inconsistent with that of N1, 'cos
>>
>>       N2.state = apply(m_k'), m_k' != m_k.
>>
>>    Notice that inconsistency can't generally be mended by exchanging m_k'
>>    and m_k if N1 will meet N2+N3 again in a common configuration.
>>
>> So to summarize the description, any quorate solution for the cluster role takeover
>> generally can't work. For instance the database replication deems to be
>> unfeasible.
>>
>> As to workarounds there is just one that I see:
>> when the totem ring configuration changes like above the cluster service
>> should be deferred until N1 is back.
>> It can't be counted as universal I think. At the same time SAFE delivery
>> should not really a challenging task, according to my reading of the
>> Totem protocol, as well as to a mail found
>>
>>    From sdake at redhat.com  Sun Mar 11 20:18:51 2012
>>    From: sdake at redhat.com (Steven Dake)
>>    Date: Sun, 11 Mar 2012 13:18:51 -0700
>>    Subject:  [PATCH] drop evs service
>>    In-Reply-To: <1331449088-28169-1-git-send-email-fdinitto@xxxxxxxxxx>
>>    References: <1331449088-28169-1-git-send-email-fdinitto@xxxxxxxxxx>
>>    Message-ID: <4F5D08AB.1010202@xxxxxxxxxx>
>>
>>    Ugh
>>    On 03/10/2012 11:58 PM, Fabio M. Di Nitto wrote:
>>    > From: "Fabio M. Di Nitto" <fdinitto at redhat.com>
>>    >
>>    > there are several reasons for this:
>>    >
>>    > 1) evs is only partially implemented with no plans to complete
>>    > it
>>    >
>>    > typedef enum {
>>    >        EVS_TYPE_UNORDERED, /* not implemented */
>>    >        EVS_TYPE_FIFO,          /* same as agreed */
>>    >        EVS_TYPE_AGREED,
>>    >        EVS_TYPE_SAFE           /* not implemented */
>>    > } evs_guarantee_t;
>>    >
>>
>>    We should implement safe at some point - its pretty easy to do.
>>
>> With best wishes,
>>
>> Andrei
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss