On Tue, Mar 20, 2018 at 6:45 AM, Sage Weil <sweil@xxxxxxxxxx> wrote: > On Mon, 19 Mar 2018, Gregory Farnum wrote: >> On Mon, Mar 19, 2018 at 7:33 AM, John Spray <jspray@xxxxxxxxxx> wrote: >> > Hi all, >> > >> > I was looking at places in ceph-mgr where we send a command from a >> > module, and then want to proceed with some logic that involves reading >> > the osdmap (there is a local copy in the manager, maintained by >> > Objecter). >> > >> > I had been thinking that we should include cluster map epochs in the >> > MMonCommandAck messages so that the client can (optionally) wait for >> > that latest OSDMap before it considers the command complete. >> > >> > Then I thought, maybe this isn't necessary at all, because the mons >> > would be doing the check_subs() etc calls before they actually respond >> > to commands, so clients would always get their updated maps before >> > seeing a command response message. >> > >> > So: mon experts, what do you think? Is it safe to assume that clients >> > will get their subscription updates before a command completion (even >> > in the case of commands being forwarded)? Or do we maybe need a >> > little bit more logic on the client side in the manager? >> >> I would expect the order of op replies versus subscription fulfillment >> messages to be an implementation detail, even if we do currently spool >> off new map subscription requests inline with committing them. (I >> don’t know at all if that’s the case.) > > Currently all of the subs are satisfied by update_from_paxos(), which > means they get fulfilled before any replies (which are waiting_for_commit > completions). Having recently fixed one of the monitor services to do this > that wasn't in order to fix a subsrciption bug, I'm pretty confident this > is the "right" place to do it given how the mon is currently structured. if the command *updates* the status of monitor in the sense that it triggers a proposal, i think it's safe to assume that the client which sends the command will be updated with the latest osdmap. but if it just *queries* the cluster status from the mon, and the behavior of client depends on the osdmap, there is a risk of racing. John, what specific ceph-mgr module or calling path was you looking at? > > I think we have two options: acknowledge and enshrine this is part of the > mon protocol as John suggests. No code changes but some small risk of > regretting this if the mon ever gets a complete rewrite. > > Or add epochs to the MonCommands so that clients can explicitly wait. > There is almost precedent for this in that PaxosService messages (special > purpose non-command messages) have an version in them and their replies > generally include one as well. It would take quite a bit of work to > extend this to include commands, though, and even if we did there are some > commands that span multiple services and thus have an ill-defined > version/epoch to pin themselves to. This would require a lot of work and > at the end of the day would require extra code on the clients to be > "correct"... code that would never actually be exercised because, in > reality, the current mon implementation always returns the maps before the > reply. > > I can't think of any reason why we'd opt for #2 given the opportunity > cost. > > sage -- Regards Kefu Chai -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html