On Wed, Mar 6, 2019 at 3:49 AM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > > I realize I'm a bit late here, but I had some thoughts I wanted to get > out as well... > > On Wed, Feb 20, 2019 at 7:54 PM Sage Weil <sage@xxxxxxxxxxxx> wrote: > > > > On Thu, 21 Feb 2019, Liu, Chunmei wrote: > > > Hi all, > > > > > > Here we want to discuss ceph-osd multiple queues and how can we > > > implement crimson-osd more efficient with or without these queues. > > > > > > We noticed there are multiple places for enqueue operations in current > > > ceph-osd for a request when some preconditions are not satisfied such as > > > session->waiting_on_map(waiting for map), slot->waiting(waiting for pg), > > > waiting_for/map/peered/active/flush/scrub/** etc in pg.h, we need hold > > > the request in these waiting queues, when some certain precondition is > > > satisfied these enqueued request will be dequeued and enqueue front to > > > work queue again to go through all the precondition checks from the > > > beginning. > > > > > > 1. is it necessary to go through all the precondition checks again > > > from the beginning or we can continue from the blocked check? > > > > Look at PG.h line ~1303 or so for a summary of the various queues. It's a > > mix: about half of them block and then stop blocking, never to block > > again, until a new peering interval. The others can start/stop blocking > > at any time. > > > > I think this means that we should repeat all of the precondition checks. > > > > > Crimson-osd is based on seastar framewok and use > > > future/promise/continue chains, when a task's precondition is not > > > satisfied at now it will return a future immediately and when promise > > > fulfill the future, the continue task will be push to task queue of > > > seastar reactor to schedule. In this case we still need hold a queue > > > for each precondition to keep track of pending futures, when some > > > precondition is satisfied to call the waiting futures' promise to > > > fulfill the future. > > > > > > 2. We have two choice here: a). use application its own queue to do > > > request schedule just like the current ceph-osd (enqueue/dequeue request > > > from one queue to another when precondition is not satisfied), in this > > > case seastar reactor task scheduler is not involved in b). Use seastar > > > reactor task queue, in this case use future/promise/continue model when > > > precondition is not satisfied, let seastar reactor do schedule (also > > > need application queues for tracking pending futures) > > > From our crimson-messenger experience, for some simple repeat > > > action such as send-message, seems application queue is more effective > > > than seastar reactor task queue. We are not sure for osd/pg this kind > > > of complex case, if it is still more effective. > > > Which one is better for crimson-osd? > > > > My gut says this will make for more robust code anyway to use an > > application queue, and the blocking is relatively rare, so I wouldn't > > worry about the overhead of repeating those checks. But... I don't have > > any experience or intuition around what makes sense in the future/promise > > style of things. :/ > > I'm actually on the other side of this fence. The queues are fairly > stable now, but getting them to that point took a long time and > maintaining them correctly is still one of the most finicky parts of > making real changes in the OSD code. They are a big piece of "global" > state that don't show up in many places but are absolutely critical to > maintain correctly, so it's hard for developers to learn the rules > about them AND easy to miss that they need to be considered when > making otherwise-unrelated changes. > I was very much hoping that we could turn all of that explicit mapping > into implicit dependency chains that validate precondition (or pause > until they are satisfied) using futures that can be handled by the > reactor and otherwise only need to be considered at the point where > they are asserted, rather than later on at random places in the code. > I *think* this is feasible? yes, i am also in favor of this. see https://github.com/ceph/ceph/pull/26697/commits/81e906d82d9e04ebe5b8b230d424b300ebff2f93 and https://github.com/ceph/ceph/pull/26697/commits/d64c7c8022cacfc787231cfa61d9ea0fdcc58013#diff-1449683df2509676ff6b4977eff7e74bR660 for examples . to chain the producer and consumer in the same place helps with the readability and probably helps with the performance. > > There is a bit of a challenge to this when debugging blocked ops, but > I presume we'll need to develop a robust way of checking reactor > dependency chains anyway so I don't think it should be any worse than > if we had to build up debugging around all the queues. ahh, this is a good point. i never thought about a way to check dependency chain. this would need a probe touching the innards of reactor. > > > > > > 3. For QOS, do we have to use some application queue to implement > > > Qos? Means we can't avoid application queue for QOS? > > > > Yeah, I think we'll need the app queue for this anyway! > > It would depend on exactly what functionality we need, but I don't > think this is accurate. We can chain futures such that we wait for the > previous client op to complete, then wait on a timer, if we are just > limiting it to an absolute IOP rate. dmclock is a little harder but we > can also do futures that are satisfied by the completion of a single > "child" future, which would let us combine many different conditions > together and probably build that model. > -Greg -- Regards Kefu Chai