Re: [PATCH V5 4/8] io_uring: support SQE group

Pavel Begunkov <asml.silence@xxxxxxxxx> · Tue, 10 Sep 2024 21:31:45 +0100

On 9/10/24 16:04, Ming Lei wrote:
On Tue, Sep 10, 2024 at 02:12:53PM +0100, Pavel Begunkov wrote:
On 9/7/24 10:36, Ming Lei wrote:
...
Wrt. ublk, group provides zero copy, and the ublk io(group) is generic
IO, sometime IO_LINK is really needed & helpful, such as in ublk-nbd,
send(tcp) requests need to be linked & zc. And we shouldn't limit IO_LINK
for generic io_uring IO.

from nuances as such, which would be quite hard to track, the semantics
of IOSQE_CQE_SKIP_SUCCESS is unclear.

IO group just follows every normal request.

It tries to mimic but groups don't and essentially can't do it the
same way, at least in some aspects. E.g. IOSQE_CQE_SKIP_SUCCESS
usually means that all following will be silenced. What if a
member is CQE_SKIP, should it stop the leader from posting a CQE?
And whatever the answer is, it'll be different from the link's
behaviour.

Here it looks easier than link's:

- only leader's IOSQE_CQE_SKIP_SUCCESS follows linked request's rule
- all members just respects the flag for its own, and not related with
leader's


Regardless, let's forbid IOSQE_CQE_SKIP_SUCCESS and linked timeouts
for groups, that can be discussed afterwards.

It should easy to forbid IOSQE_CQE_SKIP_SUCCESS which is per-sqe, will do
it in V6.

I am not sure if it is easy to disallow IORING_OP_LINK_TIMEOUT, which
covers all linked sqes, and group leader could be just one of them.
Can you share any idea about the implementation to forbid LINK_TIMEOUT
for sqe group?

diff --git a/io_uring/timeout.c b/io_uring/timeout.c
index 671d6093bf36..83b5fd64b4e9 100644
--- a/io_uring/timeout.c
+++ b/io_uring/timeout.c
@@ -542,6 +542,9 @@ static int __io_timeout_prep(struct io_kiocb *req,
  	data->mode = io_translate_timeout_mode(flags);
  	hrtimer_init(&data->timer, io_timeout_get_clock(data), data->mode);
+	if (is_timeout_link && req->ctx->submit_state.group.head)
+		return -EINVAL;
+
  	if (is_timeout_link) {
  		struct io_submit_link *link = &req->ctx->submit_state.link;

This should do, they already look into the ctx's link list. Just move
it into the "if (is_timeout_link)" block.

OK.



1) fail in linked chain
- follows IO_LINK's behavior since io_fail_links() covers io group

2) otherwise
- just respect IOSQE_CQE_SKIP_SUCCESS

And also it doen't work with IORING_OP_LINK_TIMEOUT.

REQ_F_LINK_TIMEOUT can work on whole group(or group leader) only, and I
will document it in V6.

It would still be troublesome. When a linked timeout fires it searches
for the request it's attached to and cancels it, however, group leaders
that queued up their members are discoverable. But let's say you can find
them in some way, then the only sensbile thing to do is cancel members,
which should be doable by checking req->grp_leader, but might be easier
to leave it to follow up patches.

We have changed sqe group to start queuing members after leader is
completed. link timeout will cancel leader with all its members via
leader->grp_link, this behavior should respect IORING_OP_LINK_TIMEOUT
completely.

Please see io_fail_links() and io_cancel_group_members().



+
+		lead->grp_refs += 1;
+		group->last->grp_link = req;
+		group->last = req;
+
+		if (req->flags & REQ_F_SQE_GROUP)
+			return NULL;
+
+		req->grp_link = NULL;
+		req->flags |= REQ_F_SQE_GROUP;
+		group->head = NULL;
+		if (lead->flags & REQ_F_FAIL) {
+			io_queue_sqe_fallback(lead);

Let's say the group was in the middle of a link, it'll
complete that group and continue with assembling / executing
the link when it should've failed it and honoured the
request order.

OK, here we can simply remove the above two lines, and link submit
state can handle this failure in link chain.

If you just delete then nobody would check for REQ_F_FAIL and
fail the request.

io_link_assembling() & io_link_sqe() checks for REQ_F_FAIL and call
io_queue_sqe_fallback() either if it is in link chain or
not.

The case we're talking about is failing a group, which is
also in the middle of a link.

LINK_HEAD -> {GROUP_LEAD, GROUP_MEMBER}

Let's say GROUP_MEMBER fails and sets REQ_F_FAIL to the lead,
then in v5 does:

if (lead->flags & REQ_F_FAIL) {
	io_queue_sqe_fallback(lead);
	return NULL;
}

In which case it posts cqes for GROUP_LEAD and GROUP_MEMBER,
and then try to execute LINK_HEAD (without failing it), which
is wrong. So first we need:

if (state.linked_link.head)
	req_fail_link_node(state.linked_link.head);

For group leader, link advancing is always done via io_queue_next(), in
which io_disarm_next() is called for failing the whole remained link
if the current request is marked as FAIL.


And then we can't just remove io_queue_sqe_fallback(), because
when a group is not linked there would be no io_link_sqe()
to fail it. You can do:

If one request in group is marked as FAIL, io_link_assembling()
will return true, and io_link_sqe() will fail it.

Hmm, you're right, even though it's not a great way of doing it,
i.e. pushing a req into io_link_sqe() even when linking has never
been requested, but that's fine. I can drop a quick patch on
top if it bothers me.

--
Pavel Begunkov