Re: crimson-osd queues discussion

kefu chai <tchaikov@xxxxxxxxx> · Thu, 14 Mar 2019 23:49:25 +0800

On Thu, Mar 14, 2019 at 5:43 AM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>
> On Wed, Mar 13, 2019 at 12:16 AM kefu chai <tchaikov@xxxxxxxxx> wrote:
> >
> > On Wed, Mar 13, 2019 at 2:31 AM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> > >
> > > On Fri, Mar 8, 2019 at 6:23 PM kefu chai <tchaikov@xxxxxxxxx> wrote:
> > > > yes, i am also in favor of this. see
> > > > https://github.com/ceph/ceph/pull/26697/commits/81e906d82d9e04ebe5b8b230d424b300ebff2f93
> > > > and https://github.com/ceph/ceph/pull/26697/commits/d64c7c8022cacfc787231cfa61d9ea0fdcc58013#diff-1449683df2509676ff6b4977eff7e74bR660
> > > > for examples . to chain the producer and consumer in the same place
> > > > helps with the readability and probably helps with the performance.
> > >
> > > These links have gone stale in a rebase; can you share the commit titles?
> >
> > sure, they are
> > - "crimson/osd: wait osdmap before processing peering evt"
> > - "crimson/osd/pg: wait until pg is active"
> > in https://github.com/ceph/ceph/pull/26697
>
> Thanks.
>
> >
> > > Browsing some of them, I see that "crimson/osd: wait osdmap before
> > > processing peering evt" has added an explicit "waiting_peering"
> > > application queue that I presume we have to maintain just like all the
> > > others that exist in the classic code.
> >
> > to be specific, it is a map of shared_promise, ideally any
> > request/event expecting an osdmap not available yet should be waiting
> > for a future returned by the promise stored in this map. once a
> > new-enough map is consumed, all waiters of maps older than that map
> > will be awoken and continue with whatever they were doing. currently,
> > only peering messages are using this facility. once there are more
> > consumers, probably we should rename it to osdmap_promises.
>
> Right.
> Now, in defense of this approach, they do look simpler than the queues
> we have right now. I'm just not sure if that's because there are fewer
> queues and interactions, or if they're going to maintain that
> simplicity. What's the plan going forward?
>

to have more context, i am quoting Chunmei's query and Sage's reply in
this very thread

> >   1. is it necessary to go through all the precondition checks again
> > from the beginning or we can continue from the blocked check?
>
> Look at PG.h line ~1303 or so for a summary of the various queues.  It's a
> mix: about half of them block and then stop blocking, never to block
> again, until a new peering interval.  The others can start/stop blocking
> at any time.
>
> I think this means that we should repeat all of the precondition checks.

in https://github.com/ceph/ceph/pull/26697, i am only performing two checks

- to wait for osdmap before handling peering messages
- to wait for PG active before serving read osd op

so this change is modeling simplified version of the real world.
probably it oversimplifies the real world. as it does not take the
multiple checks into consideration.

> When I was thinking about this before: In the "classic" code, we have
> to be careful in a lot of places to explicitly wake up queues in the
> right order. If we try and do it implicitly with futures, it would
> still look a lot like that but I think we might need/be able to map
> more explicitly to blocking operations ahead of us in line, and then
> we wouldn't need to worry about ordering wake-ups properly. I'm not
> sure if this current approach will take care of that or not?
> That is, in addition to waiting for specific epochs, we might need to
> also track futures for things like "the last operation from this
> client" or "to this PG" and block on them plus the other conditions of
> our own. Does that make any sense?

yes, it makes sense to me! we could expand the current approach to
have a grand `wait_for()` condition. it will be driven by multiple
sub-conditions, like (in pseudo code)

seastar::future<> wait_for(req) {
  return wait_for_map(req->epoch).then([req] {
    return do_with(false, [req](bool& interrupted) {
      return do_until([&] { return interrupted; }, [&,req] {
        interrupted = false;
        return wait_for_peered(req->pg)
          .then([req] {
            return wait_for_active(req.pg);
          }).then([req] {
            return wait_for_flush(req.pg);
          }).then([req] {
            return wait_for_scrub(req.oid);
          }).then([req] {
            return wait_for_unreadable(req.oid);
          }).then([req] {
            return wait_for_blocked(req.oid);
          }).then([req] {
            return wait_for_rw_locks(req.oid);
          }).handle_exception_type([&](const blocked_exception&) {
              interrupted = false;
          });
        });
    });
  });
}

seastar::future<> wait_for_scrub(hoid oid) {
  if (!is_scrubbing(oid)) {
    return seastar::now();
  }
  return do_wait_for_scrub(oid).then([] {
    throw blocked_exception{};
  }
}

so if any of the check fails to be satisfied, it will redo all of them.

-- 
Regards
Kefu Chai