Re: A question about Ceph's paxos implication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 19, 2017 at 9:38 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Fri, 19 May 2017, fisherman wrote:
>> On Fri, May 19, 2017 at 10:37 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>> > On Fri, 19 May 2017, fisherman wrote:
>> >> Hi, Sage and all Cepher
>> >>
>> >>    I am reading Ceph's implementation of paxos and have a question about it.
>> >>    The question is given by an example below:
>> >>
>> >>    Assume there are 5 monitor nodes: n1, n2, n3, n4, n5.
>> >>
>> >> 1) Node n1 is the leader,  all nodes are synchroined with
>> >> Last_committed=100, and there is no pending operation;
>> >> 2) A client, say c1, sends a request R1 to n1;
>> >> 3) Node n1 proposes a value v(for R1) with log version 101, stores
>> >> version 101 and pending_v =101 in its db. But it goes down before
>> >> sending anything to other nodes;
>> >>    Note: only n1 has pending_v == 101.
>> >> 4) Node n2 becomes the leader(without n1) and the cluster become
>> >> active. Client c1 querys n2 for status, and the result shows R1 is
>> >> lost;
>> >> 5) Node n1 recovers and becomes leader again;
>> >> 6) Node n1 finds pending_v == 101 and log version 101, so R1 get
>> >> replicated and applied;
>> >> 7) Client C1 queries again, and finds R1 has been applied.
>> >>     ==>inconsitent with the result of 4)
>> >>
>> >> Am I right on this point?
>> >
>> > IIRC at step 4, as soon as a quorum is formed without n1, the original
>> > proposal from n1 is rendered obsolete.  (If it isn't explicitly
>> > invalidated it would also be highly likely to be implicitly as soon as the
>> > new quorum passed its first proposal.)
>>    Maybe the original proposal should be rendered obsolete in
>> handle_last function, after having got ack from everyone in quorum,
>> but I can't find the code.
>>    It can be invalidated by the first proposal of the new quorum. The
>> inconsistency problem I described only occurs when read happens before
>> any new proposal.
>
> Yeah, I think the simplest fix is to *always* propose from
> handle_last.  If a previously proposed value wasn't learned, we can
> do a 'null' proposal that still bumps up last_committed.  That happens
> before the lease is extended so we avoid any window of readability
> before the quorum could fail and a new round including n1 could re-propose
> the old value.  This guard
>
>       // did we learn an old value?
>       if (uncommitted_v == last_committed+1 &&
>           uncommitted_value.length()) {
>
> would prevent it from being used because last_committed would have
> advanced.
>
> Does that seem reasonable?

I'm confused. Don't we consider any node with a higher election number
to have the longer log, and trim whoever has commits which don't match
that?

>
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux