Re: [PATCH 6/8] io_uring/net: support multishot for send

Jens Axboe <axboe@xxxxxxxxx> · Thu, 29 Feb 2024 08:42:18 -0700

On 2/28/24 6:46 PM, Jens Axboe wrote:
> On 2/28/24 4:49 PM, Jens Axboe wrote:
>> On 2/28/24 10:28 AM, Jens Axboe wrote:
>>> When I have some time I can do add the append case, or feel free to do
>>> that yourself, and I can run some testing with that too.
>>
>> I did a quick hack to add the append mode, and by default we get roughly
>> ring_size / 2 number of appended vecs, which I guess is expected.
>> There's a few complications there:
>>
>> 1) We basically need a per-send data structure at this point. While
>>    that's not hard to do, at least for my case I'd need to add that just
>>    for this case and everything would now need to do it. Perhaps. You
>>    can perhaps steal a few low bits and only do it for sendmsg. But why?
>>    Because now you have multiple buffer IDs in a send completion, and
>>    you need to be able to unravel it. If we always receive and send in
>>    order, then it'd always been contiguous, which I took advantage of.
>>    Not a huge deal, just mentioning some of the steps I had to go
>>    through.
>>
>> 2) The iovec. Fine if you have the above data structure, as you can just
>>    alloc/realloc -> free when done. I just took the easy path and made
>>    the iovec big enough (eg ring size).
>>
>> Outside of that, didn't need to make a lot of changes:
>>
>>  1 file changed, 39 insertions(+), 17 deletions(-)
>>
>> Performance is great, because we get to pass in N (in my case ~65)
>> packets per send. No more per packet locking. Which I do think
>> highlights that we can do better on the multishot send/sendmsg by
>> grabbing buffers upfront and passing them in in one go rather than
>> simply loop around calling tcp_sendmsg_locked() for each one.
> 
> In terms of absolute numbers, previous best times were multishot send,
> with ran in 3125 usec. Using either the above approach, or a hacked up
> version of multishot send that uses provided buffers and bundles them
> into one send (ala sendmsg), the runtimes are within 1% of each other
> and too close to call. But the runtime is around 2320, or aroudn 25%
> faster than doing one issue at the time.
> 
> This is using the same small packet size of 32 bytes. Just did the
> bundled send multishot thing to test, haven't tested more than that so
> far.

Update: I had a bug in the sendmsg with multiple vecs, it did not have a
single msg inflight at the same time, it could be multiple. Which
obviously can't work. That voids the sendmsg append results for now.
Will fiddle with it a bit and get it working, and post the full results
when I have them.

-- 
Jens Axboe