Re: [PATCH 6/8] io_uring/net: support multishot for send

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/28/24 4:49 PM, Jens Axboe wrote:
> On 2/28/24 10:28 AM, Jens Axboe wrote:
>> When I have some time I can do add the append case, or feel free to do
>> that yourself, and I can run some testing with that too.
> 
> I did a quick hack to add the append mode, and by default we get roughly
> ring_size / 2 number of appended vecs, which I guess is expected.
> There's a few complications there:
> 
> 1) We basically need a per-send data structure at this point. While
>    that's not hard to do, at least for my case I'd need to add that just
>    for this case and everything would now need to do it. Perhaps. You
>    can perhaps steal a few low bits and only do it for sendmsg. But why?
>    Because now you have multiple buffer IDs in a send completion, and
>    you need to be able to unravel it. If we always receive and send in
>    order, then it'd always been contiguous, which I took advantage of.
>    Not a huge deal, just mentioning some of the steps I had to go
>    through.
> 
> 2) The iovec. Fine if you have the above data structure, as you can just
>    alloc/realloc -> free when done. I just took the easy path and made
>    the iovec big enough (eg ring size).
> 
> Outside of that, didn't need to make a lot of changes:
> 
>  1 file changed, 39 insertions(+), 17 deletions(-)
> 
> Performance is great, because we get to pass in N (in my case ~65)
> packets per send. No more per packet locking. Which I do think
> highlights that we can do better on the multishot send/sendmsg by
> grabbing buffers upfront and passing them in in one go rather than
> simply loop around calling tcp_sendmsg_locked() for each one.

In terms of absolute numbers, previous best times were multishot send,
with ran in 3125 usec. Using either the above approach, or a hacked up
version of multishot send that uses provided buffers and bundles them
into one send (ala sendmsg), the runtimes are within 1% of each other
and too close to call. But the runtime is around 2320, or aroudn 25%
faster than doing one issue at the time.

This is using the same small packet size of 32 bytes. Just did the
bundled send multishot thing to test, haven't tested more than that so
far.

-- 
Jens Axboe





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux