On 10/20/22 15:51, Stefan Metzmacher wrote:
Hi Pavel,
So far I came up with a IORING_SEND_NOTIF_REPORT_USAGE opt-in flag
and the reporting is done in cqe.res with IORING_NOTIF_USAGE_ZC_USED (0x00000001)
and/or IORING_NOTIF_USAGE_ZC_COPIED (0x8000000). So the caller is also
able to notice that some parts were able to use zero copy, while other
fragments were copied.
Are we really interested in multihoming and probably some very edge cases?
I'd argue we're not and it should be a single bool hint indicating whether
zc is viable or not. It can do more complex calculations _if_ needed, e.g.
looking inside skb's and figure out how many bytes were copied but as for me
it should better be turned into a single bool in the end. Could also be the
number of bytes copied, but I don't think we can't have the accuracy for
that (e.g. what we're going to return if some protocol duplicates an skb
and sends to 2 different devices or is processing it in a pipeline?)
So the question is what is the use case for having 2 flags?
It's mostly for debugging.
Ok, than it sounds like we don't need it.
btw, now we've got another example why the report flag is a good idea,
I don't understand that line...
I'm just telling that IORING_SEND_NOTIF_* instead of unconditional reporting
is more flexible and extendible from the uapi perspective.
we can't use cqe.res unconditionally because we want to have a "one CQE
per request" mode, but it's fine if we make it and the report flag
mutually exclusive.
You mean we can add an optimized case where SEND[MSG]_ZC would not
generate F_MORE and skips F_NOTIF, because we copied or the transmission
path was really fast?
It is rather about optionally omitting the first (aka completion) cqe and
posting only the notification cqe, which makes a lot of sense for UDP and
some TCP use cases.
Then I'd move to IORING_CQE_F_COPIED again...
[...]
-struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx)
+static void __io_notif_complete_tw_report_usage(struct io_kiocb *notif, bool *locked)
Just shove all that into __io_notif_complete_tw().
Ok, and then optimze later?
Right, I'm just tired of back porting patches by hand :)
Otherwise we could have IORING_CQE_F_COPIED by default without opt-in
flag...
+static void io_uring_tx_zerocopy_callback_report_usage(struct sk_buff *skb,
+ struct ubuf_info *uarg,
+ bool success)
+{
+ struct io_notif_data *nd = container_of(uarg, struct io_notif_data, uarg);
+
+ if (success && !nd->zc_used && skb)
+ nd->zc_used = true;
+ else if (unlikely(!success && !nd->zc_copied))
+ nd->zc_copied = true;
It's fine but racy, so let's WRITE_ONCE() to indicate it.
I don't see how this could be a problem, but I can add it.
It's not a problem, but better to be a little be more explicit
about parallel writes.
diff --git a/io_uring/notif.h b/io_uring/notif.h
index 5b4d710c8ca5..5ac7a2745e52 100644
--- a/io_uring/notif.h
+++ b/io_uring/notif.h
@@ -13,10 +13,12 @@ struct io_notif_data {
struct file *file;
struct ubuf_info uarg;
unsigned long account_pages;
+ bool zc_used;
+ bool zc_copied;
IIRC io_notif_data is fully packed in 6.1, so placing zc_{used,copied}
there might complicate backporting (if any). We can place them in io_kiocb
directly and move in 6.2. Alternatively account_pages doesn't have to be
long.
As far as I can see kernel-dk-block/io_uring-6.1 alread has your
shrink patches included...
Sorry, I mean 6.0
--
Pavel Begunkov