On 6/23/20 6:32 AM, Pavel Begunkov wrote: > On 22/06/2020 20:11, Jens Axboe wrote: >>> I think the solution here is to defer the cq ring filling + commit to the >>> caller instead of deep down the stack, I think that's a nice win in general. >>> To do that, we need to be able to do it after io_submit_sqes() has been >>> called. We can either do that inline, by passing down a list or struct >>> that allows the caller to place the request there instead of filling >>> the event, or out-of-band by having eg a percpu struct that allows the >>> same thing. In both cases, the actual call site would do something ala: > > I had similar stuff long ago but with a different premise -- it was > defer-batching io_put_req() without *fill_event(). It also helped to rework > synchronisation and reduce # of atomics, and allowed req reuse. > Probably, easier to revive if this sees the light. I'm going to polish this series a bit, then post it for review. >>> if (comp_list && successful_completion) { >>> req->result = ret; >>> list_add_tail(&req->list, comp_list); >>> } else { >>> io_cqring_add_event(req, ret); >>> if (!successful_completion) >>> req_set_fail_links(req); >>> io_put_req(req); >>> } >>> >>> and then have the caller iterate the list and fill completions, if it's >>> non-empty on return. >>> >>> I don't think this is necessarily hard, but to do it nicely it will >>> touch a bunch code and hence be quite a bit of churn. I do think the >>> reward is worth it though, as this applies to the "normal" submission >>> path as well, not just the SQPOLL variant. > > The obvious problem with CQE batching is latency, and it can be especially > bad for SQPOLL. Can be reasonable to add "max batch" parameter to > io_uring or along a similar vein. Yeah, we need some flush-at-N logic, which probably isn't that critical. 32 or something like that would do nicely, it's small enough to not cause issues, and large enough that we'll amortize the cost of the lock-and-commit dance nicely. -- Jens Axboe