On Sat, May 20, 2017 at 12:38 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > > On Fri, 19 May 2017, fisherman wrote: > > On Fri, May 19, 2017 at 10:37 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > > > On Fri, 19 May 2017, fisherman wrote: > > >> Hi, Sage and all Cepher > > >> > > >> I am reading Ceph's implementation of paxos and have a question about it. > > >> The question is given by an example below: > > >> > > >> Assume there are 5 monitor nodes: n1, n2, n3, n4, n5. > > >> > > >> 1) Node n1 is the leader, all nodes are synchroined with > > >> Last_committed=100, and there is no pending operation; > > >> 2) A client, say c1, sends a request R1 to n1; > > >> 3) Node n1 proposes a value v(for R1) with log version 101, stores > > >> version 101 and pending_v =101 in its db. But it goes down before > > >> sending anything to other nodes; > > >> Note: only n1 has pending_v == 101. > > >> 4) Node n2 becomes the leader(without n1) and the cluster become > > >> active. Client c1 querys n2 for status, and the result shows R1 is > > >> lost; > > >> 5) Node n1 recovers and becomes leader again; > > >> 6) Node n1 finds pending_v == 101 and log version 101, so R1 get > > >> replicated and applied; > > >> 7) Client C1 queries again, and finds R1 has been applied. > > >> ==>inconsitent with the result of 4) > > >> > > >> Am I right on this point? > > > > > > IIRC at step 4, as soon as a quorum is formed without n1, the original > > > proposal from n1 is rendered obsolete. (If it isn't explicitly > > > invalidated it would also be highly likely to be implicitly as soon as the > > > new quorum passed its first proposal.) > > Maybe the original proposal should be rendered obsolete in > > handle_last function, after having got ack from everyone in quorum, > > but I can't find the code. > > It can be invalidated by the first proposal of the new quorum. The > > inconsistency problem I described only occurs when read happens before > > any new proposal. > > Yeah, I think the simplest fix is to *always* propose from > handle_last. If a previously proposed value wasn't learned, we can > do a 'null' proposal that still bumps up last_committed. That happens > before the lease is extended so we avoid any window of readability > before the quorum could fail and a new round including n1 could re-propose > the old value. This guard > > // did we learn an old value? > if (uncommitted_v == last_committed+1 && > uncommitted_value.length()) { > > would prevent it from being used because last_committed would have > advanced. > > Does that seem reasonable? Yeah, I think a 'noop' proposal can fix this problem, the old proposal will be invalidated, since the 'noop' proposal has a higher PN. The paper "paxos made simple" recommands using 'noop' command to fill gaps. > > > sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html