Re: Ordering subscription messages to MonClient vs. command responses

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 27, 2018 at 10:26 AM, kefu chai <tchaikov@xxxxxxxxx> wrote:
> On Tue, Mar 20, 2018 at 6:45 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
>> On Mon, 19 Mar 2018, Gregory Farnum wrote:
>>> On Mon, Mar 19, 2018 at 7:33 AM, John Spray <jspray@xxxxxxxxxx> wrote:
>>> > Hi all,
>>> >
>>> > I was looking at places in ceph-mgr where we send a command from a
>>> > module, and then want to proceed with some logic that involves reading
>>> > the osdmap (there is a local copy in the manager, maintained by
>>> > Objecter).
>>> >
>>> > I had been thinking that we should include cluster map epochs in the
>>> > MMonCommandAck messages so that the client can (optionally) wait for
>>> > that latest OSDMap before it considers the command complete.
>>> >
>>> > Then I thought, maybe this isn't necessary at all, because the mons
>>> > would be doing the check_subs() etc calls before they actually respond
>>> > to commands, so clients would always get their updated maps before
>>> > seeing a command response message.
>>> >
>>> > So: mon experts, what do you think?  Is it safe to assume that clients
>>> > will get their subscription updates before a command completion (even
>>> > in the case of commands being forwarded)?  Or do we maybe need a
>>> > little bit more logic on the client side in the manager?
>>>
>>> I would expect the order of op replies versus subscription fulfillment
>>> messages to be an implementation detail, even if we do currently spool
>>> off new map subscription requests inline with committing them. (I
>>> don’t know at all if that’s the case.)
>>
>> Currently all of the subs are satisfied by update_from_paxos(), which
>> means they get fulfilled before any replies (which are waiting_for_commit
>> completions). Having recently fixed one of the monitor services to do this
>> that wasn't in order to fix a subsrciption bug, I'm pretty confident this
>> is the "right" place to do it given how the mon is currently structured.
>
> if the command *updates* the status of monitor in the sense that it triggers a
> proposal, i think it's safe to assume that the client which sends the command
> will be updated with the latest osdmap. but if it just *queries* the
> cluster status
> from the mon, and the behavior of client depends on the osdmap, there is
> a risk of racing. John, what specific ceph-mgr module or calling path was you
> looking at?

I was thinking specifically of updates.  This was in some code I'm
working on that creates pools: need to make sure that after my "osd
pool create" command is done, it'll be reflected in the local OSDMap.

It seems like the consensus is that it would be worthwhile to include
map epochs in the command response for commands that update things.

John

>
>>
>> I think we have two options: acknowledge and enshrine this is part of the
>> mon protocol as John suggests.  No code changes but some small risk of
>> regretting this if the mon ever gets a complete rewrite.
>>
>> Or add epochs to the MonCommands so that clients can explicitly wait.
>> There is almost precedent for this in that PaxosService messages (special
>> purpose non-command messages) have an version in them and their replies
>> generally include one as well.  It would take quite a bit of work to
>> extend this to include commands, though, and even if we did there are some
>> commands that span multiple services and thus have an ill-defined
>> version/epoch to pin themselves to.  This would require a lot of work and
>> at the end of the day would require extra code on the clients to be
>> "correct"... code that would never actually be exercised because, in
>> reality, the current mon implementation always returns the maps before the
>> reply.
>>
>> I can't think of any reason why we'd opt for #2 given the opportunity
>> cost.
>>
>> sage
>
>
>
> --
> Regards
> Kefu Chai
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux