Re: A question about Ceph's paxos implication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, May 20, 2017 at 12:38 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>
> On Fri, 19 May 2017, fisherman wrote:
> > On Fri, May 19, 2017 at 10:37 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > > On Fri, 19 May 2017, fisherman wrote:
> > >> Hi, Sage and all Cepher
> > >>
> > >>    I am reading Ceph's implementation of paxos and have a question about it.
> > >>    The question is given by an example below:
> > >>
> > >>    Assume there are 5 monitor nodes: n1, n2, n3, n4, n5.
> > >>
> > >> 1) Node n1 is the leader,  all nodes are synchroined with
> > >> Last_committed=100, and there is no pending operation;
> > >> 2) A client, say c1, sends a request R1 to n1;
> > >> 3) Node n1 proposes a value v(for R1) with log version 101, stores
> > >> version 101 and pending_v =101 in its db. But it goes down before
> > >> sending anything to other nodes;
> > >>    Note: only n1 has pending_v == 101.
> > >> 4) Node n2 becomes the leader(without n1) and the cluster become
> > >> active. Client c1 querys n2 for status, and the result shows R1 is
> > >> lost;
> > >> 5) Node n1 recovers and becomes leader again;
> > >> 6) Node n1 finds pending_v == 101 and log version 101, so R1 get
> > >> replicated and applied;
> > >> 7) Client C1 queries again, and finds R1 has been applied.
> > >>     ==>inconsitent with the result of 4)
> > >>
> > >> Am I right on this point?
> > >
> > > IIRC at step 4, as soon as a quorum is formed without n1, the original
> > > proposal from n1 is rendered obsolete.  (If it isn't explicitly
> > > invalidated it would also be highly likely to be implicitly as soon as the
> > > new quorum passed its first proposal.)
> >    Maybe the original proposal should be rendered obsolete in
> > handle_last function, after having got ack from everyone in quorum,
> > but I can't find the code.
> >    It can be invalidated by the first proposal of the new quorum. The
> > inconsistency problem I described only occurs when read happens before
> > any new proposal.
>
> Yeah, I think the simplest fix is to *always* propose from
> handle_last.  If a previously proposed value wasn't learned, we can
> do a 'null' proposal that still bumps up last_committed.  That happens
> before the lease is extended so we avoid any window of readability
> before the quorum could fail and a new round including n1 could re-propose
> the old value.  This guard
>
>       // did we learn an old value?
>       if (uncommitted_v == last_committed+1 &&
>           uncommitted_value.length()) {
>
> would prevent it from being used because last_committed would have
> advanced.
>
> Does that seem reasonable?

Yeah, I think a 'noop' proposal can fix this problem, the old proposal
will be invalidated, since the 'noop' proposal has a  higher PN.
 The paper "paxos made simple" recommands using 'noop' command to fill gaps.

>
>
> sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux