Re: wait_for_* in crimson-osd's read path

Gregory Farnum <gfarnum@xxxxxxxxxx> · Tue, 19 Feb 2019 14:14:51 -0800



On Tue, Feb 12, 2019 at 6:50 AM kefu chai <tchaikov@xxxxxxxxx> wrote:
>
> Chunmei,
>
> to continue the discussion in the last crimson standup, i am noting
> down some of our findings when reading Radoslaw's
> https://github.com/ceph/ceph/pull/24962.
>
> there are multiple places that we need to put the request on hold
> until the unmet precondition is satisfied. as noted by Yingxin that it
> is always more efficient to enqueue a request in application's own
> queue than capturing it in a continuation and stash it in the task
> queue of reactor.

Do we know how much more efficient it is?
I ask because maintaining these queues is one of the buggier systems
in the existing OSD — we can get them very stable through extensive
testing but any changes tend to take a while to flush things out.
I was very much looking forward to just making all of those status
checks a per-op future/promise rather than having to do checks,
shuffle them aside, and then do more complicated checks about the
existence of queues on the other ops. :(
-Greg

> the downside of the pending request queue is that
>
> - apparently we need to maintain a queue for each precondition that
> could hold the requests, see the `waiting_for_*` lists/maps in PG.h.
> but we also need to keep track of pending futures if we want to chain
> the maybe_wait_*() as https://pad.ceph.com/p/crimson-io-path puts. and
> the pending futures are likely to be structured in very the same way
> of these `waiting_for_` lists, the only difference is that the value
> of these containers would be futures.
> - the op will need to go through the same checks once its precondition
> is satisfied and it is enqueued again.  probably we need to check if
> there are any preconditions that implying other preconditions. if yes,
> is it plausible/worthy to /continue/ performing this request instead
> of redo all the checks ? can we reorder some of the checks for better
> performance? or for better readability?
>
> if we need to redo all the checks like we are doing in existing
> ceph-osd, we can either
>
> - use a grand seastar::repeat() to redo a request until we run into
> some exception or the request is served. or
> - use a queue for tracking the pending requests, and rerun them in the
> fiber that fulfill the precondition. for instance, if a batch of
> requests are waiting for an updated osdmap, after consuming the
> updated osdmap, the PG will need to serve all of the requests that are
> waiting for it.
>
> what do you think?
>
> --
> Regards
> Kefu Chai