Re: [PATCH V5 4/8] io_uring: support SQE group

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/7/24 10:36, Ming Lei wrote:
...
Wrt. ublk, group provides zero copy, and the ublk io(group) is generic
IO, sometime IO_LINK is really needed & helpful, such as in ublk-nbd,
send(tcp) requests need to be linked & zc. And we shouldn't limit IO_LINK
for generic io_uring IO.

from nuances as such, which would be quite hard to track, the semantics
of IOSQE_CQE_SKIP_SUCCESS is unclear.

IO group just follows every normal request.

It tries to mimic but groups don't and essentially can't do it the
same way, at least in some aspects. E.g. IOSQE_CQE_SKIP_SUCCESS
usually means that all following will be silenced. What if a
member is CQE_SKIP, should it stop the leader from posting a CQE?
And whatever the answer is, it'll be different from the link's
behaviour.

Here it looks easier than link's:

- only leader's IOSQE_CQE_SKIP_SUCCESS follows linked request's rule
- all members just respects the flag for its own, and not related with
leader's


Regardless, let's forbid IOSQE_CQE_SKIP_SUCCESS and linked timeouts
for groups, that can be discussed afterwards.

It should easy to forbid IOSQE_CQE_SKIP_SUCCESS which is per-sqe, will do
it in V6.

I am not sure if it is easy to disallow IORING_OP_LINK_TIMEOUT, which
covers all linked sqes, and group leader could be just one of them.
Can you share any idea about the implementation to forbid LINK_TIMEOUT
for sqe group?

diff --git a/io_uring/timeout.c b/io_uring/timeout.c
index 671d6093bf36..83b5fd64b4e9 100644
--- a/io_uring/timeout.c
+++ b/io_uring/timeout.c
@@ -542,6 +542,9 @@ static int __io_timeout_prep(struct io_kiocb *req,
 	data->mode = io_translate_timeout_mode(flags);
 	hrtimer_init(&data->timer, io_timeout_get_clock(data), data->mode);
+ if (is_timeout_link && req->ctx->submit_state.group.head)
+		return -EINVAL;
+
 	if (is_timeout_link) {
 		struct io_submit_link *link = &req->ctx->submit_state.link;
This should do, they already look into the ctx's link list. Just move
it into the "if (is_timeout_link)" block.


1) fail in linked chain
- follows IO_LINK's behavior since io_fail_links() covers io group

2) otherwise
- just respect IOSQE_CQE_SKIP_SUCCESS

And also it doen't work with IORING_OP_LINK_TIMEOUT.

REQ_F_LINK_TIMEOUT can work on whole group(or group leader) only, and I
will document it in V6.

It would still be troublesome. When a linked timeout fires it searches
for the request it's attached to and cancels it, however, group leaders
that queued up their members are discoverable. But let's say you can find
them in some way, then the only sensbile thing to do is cancel members,
which should be doable by checking req->grp_leader, but might be easier
to leave it to follow up patches.

We have changed sqe group to start queuing members after leader is
completed. link timeout will cancel leader with all its members via
leader->grp_link, this behavior should respect IORING_OP_LINK_TIMEOUT
completely.

Please see io_fail_links() and io_cancel_group_members().



+
+		lead->grp_refs += 1;
+		group->last->grp_link = req;
+		group->last = req;
+
+		if (req->flags & REQ_F_SQE_GROUP)
+			return NULL;
+
+		req->grp_link = NULL;
+		req->flags |= REQ_F_SQE_GROUP;
+		group->head = NULL;
+		if (lead->flags & REQ_F_FAIL) {
+			io_queue_sqe_fallback(lead);

Let's say the group was in the middle of a link, it'll
complete that group and continue with assembling / executing
the link when it should've failed it and honoured the
request order.

OK, here we can simply remove the above two lines, and link submit
state can handle this failure in link chain.

If you just delete then nobody would check for REQ_F_FAIL and
fail the request.

io_link_assembling() & io_link_sqe() checks for REQ_F_FAIL and call
io_queue_sqe_fallback() either if it is in link chain or
not.

The case we're talking about is failing a group, which is
also in the middle of a link.

LINK_HEAD -> {GROUP_LEAD, GROUP_MEMBER}

Let's say GROUP_MEMBER fails and sets REQ_F_FAIL to the lead,
then in v5 does:

if (lead->flags & REQ_F_FAIL) {
	io_queue_sqe_fallback(lead);
	return NULL;
}

In which case it posts cqes for GROUP_LEAD and GROUP_MEMBER,
and then try to execute LINK_HEAD (without failing it), which
is wrong. So first we need:

if (state.linked_link.head)
	req_fail_link_node(state.linked_link.head);

And then we can't just remove io_queue_sqe_fallback(), because
when a group is not linked there would be no io_link_sqe()
to fail it. You can do:


io_group_sqe()
{
	if ((lead->flags & REQ_F_FAIL) && !ctx->state.link.head) {
		io_queue_sqe_fallback(lead);
		return NULL;
	}
	...
}

but it's much cleaner to move REQ_F_FAIL out of group assembling.
We'd also want to move same REQ_F_FAIL / io_queue_sqe_fallback()
out of io_link_sqe(), but I didn't mentioned because it's not
strictly required for your set AFAIR.


Assuming you'd also set the fail flag to the
link head when appropriate, how about deleting these two line
and do like below? (can be further prettified)


bool io_group_assembling()
{
	return state->group.head || (req->flags & REQ_F_SQE_GROUP);
}
bool io_link_assembling()
{
	return state->link.head || (req->flags & IO_REQ_LINK_FLAGS);
}

static inline int io_submit_sqe()
{
	...
	if (unlikely(io_link_assembling(state, req) ||
				 io_group_assembling(state, req) ||
				 req->flags & REQ_F_FAIL)) {
		if (io_group_assembling(state, req)) {
			req = io_group_sqe(&state->group, req);
			if (!req)
				return 0;
		}
		if (io_link_assembling(state, req)) {
			req = io_link_sqe(&state->link, req);
			if (!req)
				return 0;
		}
		if (req->flags & REQ_F_FAIL) {
			io_queue_sqe_fallback(req);
			return 0;

As I mentioned above, io_link_assembling() & io_link_sqe() covers
the failure handling.

--
Pavel Begunkov




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux