Re: Ordering subscription messages to MonClient vs. command responses

kefu chai <tchaikov@xxxxxxxxx> · Tue, 27 Mar 2018 17:26:08 +0800

On Tue, Mar 20, 2018 at 6:45 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> On Mon, 19 Mar 2018, Gregory Farnum wrote:
>> On Mon, Mar 19, 2018 at 7:33 AM, John Spray <jspray@xxxxxxxxxx> wrote:
>> > Hi all,
>> >
>> > I was looking at places in ceph-mgr where we send a command from a
>> > module, and then want to proceed with some logic that involves reading
>> > the osdmap (there is a local copy in the manager, maintained by
>> > Objecter).
>> >
>> > I had been thinking that we should include cluster map epochs in the
>> > MMonCommandAck messages so that the client can (optionally) wait for
>> > that latest OSDMap before it considers the command complete.
>> >
>> > Then I thought, maybe this isn't necessary at all, because the mons
>> > would be doing the check_subs() etc calls before they actually respond
>> > to commands, so clients would always get their updated maps before
>> > seeing a command response message.
>> >
>> > So: mon experts, what do you think?  Is it safe to assume that clients
>> > will get their subscription updates before a command completion (even
>> > in the case of commands being forwarded)?  Or do we maybe need a
>> > little bit more logic on the client side in the manager?
>>
>> I would expect the order of op replies versus subscription fulfillment
>> messages to be an implementation detail, even if we do currently spool
>> off new map subscription requests inline with committing them. (I
>> don’t know at all if that’s the case.)
>
> Currently all of the subs are satisfied by update_from_paxos(), which
> means they get fulfilled before any replies (which are waiting_for_commit
> completions). Having recently fixed one of the monitor services to do this
> that wasn't in order to fix a subsrciption bug, I'm pretty confident this
> is the "right" place to do it given how the mon is currently structured.

if the command *updates* the status of monitor in the sense that it triggers a
proposal, i think it's safe to assume that the client which sends the command
will be updated with the latest osdmap. but if it just *queries* the
cluster status
from the mon, and the behavior of client depends on the osdmap, there is
a risk of racing. John, what specific ceph-mgr module or calling path was you
looking at?

>
> I think we have two options: acknowledge and enshrine this is part of the
> mon protocol as John suggests.  No code changes but some small risk of
> regretting this if the mon ever gets a complete rewrite.
>
> Or add epochs to the MonCommands so that clients can explicitly wait.
> There is almost precedent for this in that PaxosService messages (special
> purpose non-command messages) have an version in them and their replies
> generally include one as well.  It would take quite a bit of work to
> extend this to include commands, though, and even if we did there are some
> commands that span multiple services and thus have an ill-defined
> version/epoch to pin themselves to.  This would require a lot of work and
> at the end of the day would require extra code on the clients to be
> "correct"... code that would never actually be exercised because, in
> reality, the current mon implementation always returns the maps before the
> reply.
>
> I can't think of any reason why we'd opt for #2 given the opportunity
> cost.
>
> sage

-- 
Regards
Kefu Chai
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html