On Mon, 19 Mar 2018, Gregory Farnum wrote: > On Mon, Mar 19, 2018 at 7:33 AM, John Spray <jspray@xxxxxxxxxx> wrote: > > Hi all, > > > > I was looking at places in ceph-mgr where we send a command from a > > module, and then want to proceed with some logic that involves reading > > the osdmap (there is a local copy in the manager, maintained by > > Objecter). > > > > I had been thinking that we should include cluster map epochs in the > > MMonCommandAck messages so that the client can (optionally) wait for > > that latest OSDMap before it considers the command complete. > > > > Then I thought, maybe this isn't necessary at all, because the mons > > would be doing the check_subs() etc calls before they actually respond > > to commands, so clients would always get their updated maps before > > seeing a command response message. > > > > So: mon experts, what do you think? Is it safe to assume that clients > > will get their subscription updates before a command completion (even > > in the case of commands being forwarded)? Or do we maybe need a > > little bit more logic on the client side in the manager? > > I would expect the order of op replies versus subscription fulfillment > messages to be an implementation detail, even if we do currently spool > off new map subscription requests inline with committing them. (I > don’t know at all if that’s the case.) Currently all of the subs are satisfied by update_from_paxos(), which means they get fulfilled before any replies (which are waiting_for_commit completions). Having recently fixed one of the monitor services to do this that wasn't in order to fix a subsrciption bug, I'm pretty confident this is the "right" place to do it given how the mon is currently structured. I think we have two options: acknowledge and enshrine this is part of the mon protocol as John suggests. No code changes but some small risk of regretting this if the mon ever gets a complete rewrite. Or add epochs to the MonCommands so that clients can explicitly wait. There is almost precedent for this in that PaxosService messages (special purpose non-command messages) have an version in them and their replies generally include one as well. It would take quite a bit of work to extend this to include commands, though, and even if we did there are some commands that span multiple services and thus have an ill-defined version/epoch to pin themselves to. This would require a lot of work and at the end of the day would require extra code on the clients to be "correct"... code that would never actually be exercised because, in reality, the current mon implementation always returns the maps before the reply. I can't think of any reason why we'd opt for #2 given the opportunity cost. sage