Re: [RFC 2/2] io_uring/net: allow to override notification tag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/19/22 13:36, Stefan Metzmacher wrote:
[...]
What do you think? It would remove the whole notif slot complexity
from caller using IORING_RECVSEND_NOTIF_FLUSH for every request anyway.

The downside is that requests then should be pretty large or it'll
lose in performance. Surely not a problem for 8MB per request but
even 4KB won't suffice. And users may want to put in smaller chunks
on the wire instead of waiting for mode data to let tcp handle
pacing and potentially improve latencies by sending earlier.

If this is optional applications can decide what fits better.

On the other hand that one notification per request idea mentioned
before can extended to 1-2 CQEs per request, which is interestingly
the approach zc send discussions started with.

In order to make use of any of this I need any way
to get 2 CQEs with user_data being the same or related.

The idea described above will post 2 CQEs (mostly) per request
as you want with an optional way to have only 1 CQE. My current
sentiment is to kill all the slot business, leave this 1-2 CQE
per request and see if there are users for whom it won't be
enough. It's anyway just a slight deviation from what I wanted
to push as a complimentary interface.

Ah, ok, removing the slot stuff again would be fine for me...

The only benefit for with slots is being able to avoid or
batch additional CQEs, correct? Or is there more to it?

CQE batching is a lesser problem, I'm more concerned of how
it sticks with the network. In short, it'll hugely underperform
with TCP if requests are not large enough.

A simple bench with some hacks, localhost, TCP, run by

./msg_zerocopy -6 -r tcp -s <size> &
./io_uring_zerocopy_tx -6 -D "::1" -s <size> -m <0,2> tcp


non-zerocopy:
4000B:  tx=8711880 (MB=33233), tx/s=1742376 (MB/s=6646)
16000B: tx=3196528 (MB=48775), tx/s=639305 (MB/s=9755)
60000B: tx=1036536 (MB=59311), tx/s=207307 (MB/s=11862)

zerocopy:
4000B:  tx=3003488 (MB=11457), tx/s=600697 (MB/s=2291)
16000B: tx=2940296 (MB=44865), tx/s=588059 (MB/s=8973)
60000B: tx=2621792 (MB=150020), tx/s=524358 (MB/s=30004)

So with something between 16k and 60k we reach the point where
ZC starts to be faster, correct?

For this setup -- yes, should be somewhat around 16-20K,
don't remember numbers for real hw, but I saw similar
tendencies.

Did you remove the loopback restriction as described in
Documentation/networking/msg_zerocopy.rst ?

right, it wouldn't outperform even with large payload otherwise

Are the results similar when using ./msg_zerocopy -6 tcp -s <size>
as client?

Shouldn't be, it also batches multiple requests to a single
(internal) notification and also exposes it to the userspace
differently.

And the reason is some page pinning overhead from iov_iter_get_pages2()
in __zerocopy_sg_from_iter()?

No, I was using registered buffers here, so instead of
iov_iter_get_pages2() business zerocopy was doing
io_uring/net.c:io_sg_from_iter(). And in any case overhead on pinning
wouldn't drastically change it.

Reusing notifications with slots will change the picture.
And it this has nothing to do with io_uring overhead like
CQE posting and so on.

Hmm I don't understand how the number of notif structures
would have any impact? Is it related to io_sg_from_iter()?

It comes from TCP stack force changing an skbuff every time
it meets a new ubuf_info (i.e. a notification handle for
simplicity), there is a slight bump on skb allocation overhead
but the main problem is seemingly comes from tcp_push and so,
feeding it down the stack. I don't think there is any fundamental
reason for why it should be working so much slower but might
be problematic from engineering perspective. I'll ask a bit
around or maybe look myself if find time for that.

--
Pavel Begunkov



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux