On Mon, Jul 29, 2024 at 02:58:58PM +0100, Pavel Begunkov wrote: > On 7/25/24 11:33, Ming Lei wrote: > > On Wed, Jul 24, 2024 at 02:41:38PM +0100, Pavel Begunkov wrote: > > > On 7/23/24 15:34, Ming Lei wrote: > ... > > > > But grp_refs is dropped after io-wq request reference drops to > > > > zero, then both io-wq and nor-io-wq code path can be unified > > > > wrt. dealing with grp_refs, meantime it needn't to be updated > > > > in extra(io-wq) context. > > > > > > Let's try to describe how it can work. First, I'm only describing > > > the dep mode for simplicity. And for the argument's sake we can say > > > that all CQEs are posted via io_submit_flush_completions. > > > > > > io_req_complete_post() { > > > if (flags & GROUP) { > > > req->io_task_work.func = io_req_task_complete; > > > io_req_task_work_add(req); > > > return; > > > } > > > ... > > > } > > > > OK. > > > > io_wq_free_work() still need to change to not deal with > > next link & ignoring skip_cqe, because group handling( > > No, it doesn't need to know about all that. > > > cqe posting, link advance) is completely moved into > > io_submit_flush_completions(). > > It has never been guaranteed that io_req_complete_post() > will be the one completing the request, > io_submit_flush_completions() can always happen. > > > struct io_wq_work *io_wq_free_work(struct io_wq_work *work) > { > ... > if (req_ref_put_and_test(req)) { > nxt = io_req_find_next(req); > io_free_req(); > } > } > > We queue linked requests only when all refs are dropped, and > the group handling in my snippet is done before we drop the > owner's reference. > > IOW, you won't hit io_free_req() in io_wq_free_work() for a > leader unless all members in its group got completed and > the leader already went through the code dropping those shared > ublk buffers. If io_free_req() won't be called for leader, leader won't be added to ->compl_reqs, and it has to be generated when all members are completed in __io_submit_flush_completions(). For !io_wq, we can align to this way by not completing leader in io_req_complete_defer(). The above implementation looks simpler, and more readable. > > > > > You can do it this way, nobody would ever care, and it shouldn't > > > affect performance. Otherwise everything down below can probably > > > be extended to io_req_complete_post(). > > > > > > To avoid confusion in terminology, what I call a member below doesn't > > > include a leader. IOW, a group leader request is not a member. > > > > > > At the init we have: > > > grp_refs = nr_members; /* doesn't include the leader */ > > > > > > Let's also say that the group leader can and always goes > > > through io_submit_flush_completions() twice, just how it's > > > with your patches. > > > > > > 1) The first time we see the leader in io_submit_flush_completions() > > > is when it's done with resource preparation. For example, it was > > > doing some IO into a buffer, and now is ready to give that buffer > > > with data to group members. At this point it should queue up all group > > > members. And we also drop 1 grp_ref. There will also be no > > > io_issue_sqe() for it anymore. > > > > Ok, then it is just the case with dependency. > > > > > > > > 2) Members are executed and completed, in io_submit_flush_completions() > > > they drop 1 grp_leader->grp_ref reference each. > > > > > > 3) When all members complete, leader's grp_ref becomes 0. Here > > > the leader is queued for io_submit_flush_completions() a second time, > > > at which point it drops ublk buffers and such and gets destroyed. > > > > > > You can post a CQE in 1) and then set CQE_SKIP. Can also be fitted > > > into 3). A pseudo code for when we post it in step 1) > > > > This way should work, but it confuses application because > > the leader is completed before all members: > > > > - leader usually provide resources in group wide > > - member consumes this resource > > - leader is supposed to be completed after all consumer(member) are > > done. > > > > Given it is UAPI, we have to be careful with CQE posting order. > > That's exactly what I was telling about the uapi previously, > I don't believe we want to inverse the natural CQE order. > > Regardless, the order can be inversed like this: > > __io_flush_completions() { > ... > if (req->flags & (SKIP_CQE|GROUP)) { > if (req->flags & SKIP_CQE) > continue; > // post a CQE only when we see it for a 2nd time in io_flush_completion(); > if (is_leader() && !(req->flags & ALREADY_SEEN)) > continue; I am afraid that ALREADY_SEEN can't work, since leader can only be added to ->compl_reqs once. Thanks, Ming