On Tue, Mar 27, 2018 at 5:43 PM, John Spray <jspray@xxxxxxxxxx> wrote: > On Tue, Mar 27, 2018 at 10:26 AM, kefu chai <tchaikov@xxxxxxxxx> wrote: >> On Tue, Mar 20, 2018 at 6:45 AM, Sage Weil <sweil@xxxxxxxxxx> wrote: >>> On Mon, 19 Mar 2018, Gregory Farnum wrote: >>>> On Mon, Mar 19, 2018 at 7:33 AM, John Spray <jspray@xxxxxxxxxx> wrote: >>>> > Hi all, >>>> > >>>> > I was looking at places in ceph-mgr where we send a command from a >>>> > module, and then want to proceed with some logic that involves reading >>>> > the osdmap (there is a local copy in the manager, maintained by >>>> > Objecter). >>>> > >>>> > I had been thinking that we should include cluster map epochs in the >>>> > MMonCommandAck messages so that the client can (optionally) wait for >>>> > that latest OSDMap before it considers the command complete. >>>> > >>>> > Then I thought, maybe this isn't necessary at all, because the mons >>>> > would be doing the check_subs() etc calls before they actually respond >>>> > to commands, so clients would always get their updated maps before >>>> > seeing a command response message. >>>> > >>>> > So: mon experts, what do you think? Is it safe to assume that clients >>>> > will get their subscription updates before a command completion (even >>>> > in the case of commands being forwarded)? Or do we maybe need a >>>> > little bit more logic on the client side in the manager? >>>> >>>> I would expect the order of op replies versus subscription fulfillment >>>> messages to be an implementation detail, even if we do currently spool >>>> off new map subscription requests inline with committing them. (I >>>> don’t know at all if that’s the case.) >>> >>> Currently all of the subs are satisfied by update_from_paxos(), which >>> means they get fulfilled before any replies (which are waiting_for_commit >>> completions). Having recently fixed one of the monitor services to do this >>> that wasn't in order to fix a subsrciption bug, I'm pretty confident this >>> is the "right" place to do it given how the mon is currently structured. >> >> if the command *updates* the status of monitor in the sense that it triggers a >> proposal, i think it's safe to assume that the client which sends the command >> will be updated with the latest osdmap. but if it just *queries* the >> cluster status >> from the mon, and the behavior of client depends on the osdmap, there is >> a risk of racing. John, what specific ceph-mgr module or calling path was you >> looking at? > > I was thinking specifically of updates. This was in some code I'm > working on that creates pools: need to make sure that after my "osd > pool create" command is done, it'll be reflected in the local OSDMap. > > It seems like the consensus is that it would be worthwhile to include > map epochs in the command response for commands that update things. > yeah, i feel the same. > John > >> >>> >>> I think we have two options: acknowledge and enshrine this is part of the >>> mon protocol as John suggests. No code changes but some small risk of >>> regretting this if the mon ever gets a complete rewrite. >>> >>> Or add epochs to the MonCommands so that clients can explicitly wait. >>> There is almost precedent for this in that PaxosService messages (special >>> purpose non-command messages) have an version in them and their replies >>> generally include one as well. It would take quite a bit of work to >>> extend this to include commands, though, and even if we did there are some >>> commands that span multiple services and thus have an ill-defined >>> version/epoch to pin themselves to. This would require a lot of work and >>> at the end of the day would require extra code on the clients to be >>> "correct"... code that would never actually be exercised because, in >>> reality, the current mon implementation always returns the maps before the >>> reply. >>> >>> I can't think of any reason why we'd opt for #2 given the opportunity >>> cost. >>> >>> sage >> >> >> >> -- >> Regards >> Kefu Chai -- Regards Kefu Chai -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html