On 2/28/24 6:46 PM, Jens Axboe wrote: > On 2/28/24 4:49 PM, Jens Axboe wrote: >> On 2/28/24 10:28 AM, Jens Axboe wrote: >>> When I have some time I can do add the append case, or feel free to do >>> that yourself, and I can run some testing with that too. >> >> I did a quick hack to add the append mode, and by default we get roughly >> ring_size / 2 number of appended vecs, which I guess is expected. >> There's a few complications there: >> >> 1) We basically need a per-send data structure at this point. While >> that's not hard to do, at least for my case I'd need to add that just >> for this case and everything would now need to do it. Perhaps. You >> can perhaps steal a few low bits and only do it for sendmsg. But why? >> Because now you have multiple buffer IDs in a send completion, and >> you need to be able to unravel it. If we always receive and send in >> order, then it'd always been contiguous, which I took advantage of. >> Not a huge deal, just mentioning some of the steps I had to go >> through. >> >> 2) The iovec. Fine if you have the above data structure, as you can just >> alloc/realloc -> free when done. I just took the easy path and made >> the iovec big enough (eg ring size). >> >> Outside of that, didn't need to make a lot of changes: >> >> 1 file changed, 39 insertions(+), 17 deletions(-) >> >> Performance is great, because we get to pass in N (in my case ~65) >> packets per send. No more per packet locking. Which I do think >> highlights that we can do better on the multishot send/sendmsg by >> grabbing buffers upfront and passing them in in one go rather than >> simply loop around calling tcp_sendmsg_locked() for each one. > > In terms of absolute numbers, previous best times were multishot send, > with ran in 3125 usec. Using either the above approach, or a hacked up > version of multishot send that uses provided buffers and bundles them > into one send (ala sendmsg), the runtimes are within 1% of each other > and too close to call. But the runtime is around 2320, or aroudn 25% > faster than doing one issue at the time. > > This is using the same small packet size of 32 bytes. Just did the > bundled send multishot thing to test, haven't tested more than that so > far. Update: I had a bug in the sendmsg with multiple vecs, it did not have a single msg inflight at the same time, it could be multiple. Which obviously can't work. That voids the sendmsg append results for now. Will fiddle with it a bit and get it working, and post the full results when I have them. -- Jens Axboe