RE: crimson-osd queues discussion

"Liu, Chunmei" <chunmei.liu@xxxxxxxxx> · Fri, 22 Feb 2019 00:50:28 +0000



> -----Original Message-----
> From: kefu chai [mailto:tchaikov@xxxxxxxxx]
> Sent: Thursday, February 21, 2019 3:36 AM
> To: Sage Weil <sage@xxxxxxxxxxxx>
> Cc: Liu, Chunmei <chunmei.liu@xxxxxxxxx>; The Esoteric Order of the Squid
> Cybernetic <ceph-devel@xxxxxxxxxxxxxxx>
> Subject: Re: crimson-osd queues discussion
> 
> On Thu, Feb 21, 2019 at 11:57 AM Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >
> > On Thu, 21 Feb 2019, Liu, Chunmei wrote:
> > > Hi all,
> > >
> > >   Here we want to discuss ceph-osd multiple queues and how can we
> > > implement crimson-osd more efficient with or without these queues.
> > >
> > >   We noticed there are multiple places for enqueue operations in
> > > current ceph-osd for a request when some preconditions are not
> > > satisfied such as
> > > session->waiting_on_map(waiting for map), slot->waiting(waiting for
> > > session->pg),
> > > waiting_for/map/peered/active/flush/scrub/** etc in pg.h, we need
> > > hold the request in these waiting queues, when some certain
> > > precondition is satisfied these enqueued request will be dequeued
> > > and enqueue front to work queue again to go through all the
> > > precondition checks from the beginning.
> > >
> > >   1. is it necessary to go through all the precondition checks again
> > > from the beginning or we can continue from the blocked check?
> >
> > Look at PG.h line ~1303 or so for a summary of the various queues.
> > It's a
> > mix: about half of them block and then stop blocking, never to block
> > again, until a new peering interval.  The others can start/stop
> > blocking at any time.
> >
> > I think this means that we should repeat all of the precondition checks.
> >
> > >    Crimson-osd is based on seastar framewok and use
> > > future/promise/continue chains, when a task's precondition is not
> > > satisfied at now it will return a future immediately and when
> > > promise fulfill the future, the continue task will be push to task
> > > queue of seastar reactor to schedule.  In this case we still need
> > > hold a queue for each precondition to keep track of pending futures,
> > > when some precondition is satisfied to call the waiting futures'
> > > promise to fulfill the future.
> > >
> > >    2. We have two choice here: a). use application its own queue to
> > > do request schedule just like the current ceph-osd (enqueue/dequeue
> > > request from one queue to another when precondition is not
> > > satisfied), in this case seastar reactor task scheduler is not
> > > involved in b). Use seastar reactor task queue, in this case use
> > > future/promise/continue model when precondition is not satisfied,
> > > let seastar reactor do schedule (also need application queues for tracking
> pending futures)
> > >      From our crimson-messenger experience, for some simple repeat
> > > action such as send-message, seems application queue is more
> > > effective than seastar reactor task queue.  We are not sure for
> > > osd/pg this kind of complex case, if it is still more effective.
> > >     Which one is better for crimson-osd?
> >
> > My gut says this will make for more robust code anyway to use an
> > application queue, and the blocking is relatively rare, so I wouldn't
> > worry about the overhead of repeating those checks.  But... I don't
> > have any experience or intuition around what makes sense in the
> > future/promise style of things.  :/
> >
> > >    3. For QOS, do we have to use some application queue to implement
> > > Qos? Means we can't avoid application queue for QOS?
> >
> > Yeah, I think we'll need the app queue for this anyway!
> 
> a straightforward translation from the existing model is like,
>  - each connection to rados client will push the decoded requests to a queue
> with QoS support. queue.push_back() will be blocked if the QoS policy asks the
> client to backoff or wait. see also seastar::queue::push_eventually(). and the
> fiber will also likely to be blocked when it's trying to read more messages from
> client.
---so you mean we implement this kind of queue when Qos policy is not satisfied, return future not push to the queue,  otherwise push to the queue?

>  - each objectstorage or pg backend will have a loop keeping grabbing requests
> from its queue, then process it, and reply to the client with the result, until the
> osd instance is asked to stop or the PG is deleted. this fiber will be blocked if
> there is no pending requests in the queue.
---- Except the above Qos queue, you mean we still need other queues for different layer (PG layer and OS layer) or not? 
----BTW, by kind of seastar::queue, we still need push and pop, and schedule by reactor, I am not sure if it is more effective than application queue, but it help chain tasks.
> 
> yeah, probably we cannot avoid queue/bucket when implementing a proper QoS,
> but i think we can have a futurized queue so we can avoid the enqueue/dequeue
> if the underlying "worker thread" is fast enough to consume the request
> *immediately*. put in other way, is it possible to avoid the fiber context switch if
> we know that the queue is ready to
> pop() the current request? namely, we can have a slightly different
> implementation.
> 
> so the producer side will look like,
> 
>   do_until([this] { return _stopping; },
>                 [this] { return conn.read_request().then([this](auto req) {
>                            // make_queueable() is a generic function which extracts the
> weight/cost/priority from given request
>                            return
> queue.push_and_pop(make_queueable(req)).then([req,this] {
>                              // if this req is lucky enough, it won't need to wait even a jiffy
> before being served.
>                              return do_request(req);
>                            }).then[conn,this](auto resp) {
>                              return conn.send_response(resp);
>                            }); });
> 
> and the above will be the client's connection's start() method. if we choose to
> have an optimistic queue, where a request will be handled immediately without
> being blocked in the best case. but on the downside, all the pending clients will
> need to wrap the pending request in a seastar task, and wait on its promise. and
> the queue will need to keep tracking all the pending futures of the pending client
> requests.
---For the requests whose precondition is not satisfied, do we need per queue per precondition (used to tracking pending futures) like what in pg.h?
Then how to repeat precondition checks if we chain all tasks?
> 
> --
> Regards
> Kefu Chai