Re: [PATCH 2/5] io_uring/net: add provided buffer support for IORING_OP_SEND

Gabriel Krisman Bertazi <krisman@xxxxxxx> · Thu, 25 Apr 2024 13:56:40 +0200

Jens Axboe <axboe@xxxxxxxxx> writes:

> It's pretty trivial to wire up provided buffer support for the send
> side, just like how it's done the receive side. This enables setting up
> a buffer ring that an application can use to push pending sends to,
> and then have a send pick a buffer from that ring.
>
> One of the challenges with async IO and networking sends is that you
> can get into reordering conditions if you have more than one inflight
> at the same time. Consider the following scenario where everything is
> fine:
>
> 1) App queues sendA for socket1
> 2) App queues sendB for socket1
> 3) App does io_uring_submit()
> 4) sendA is issued, completes successfully, posts CQE
> 5) sendB is issued, completes successfully, posts CQE
>
> All is fine. Requests are always issued in-order, and both complete
> inline as most sends do.

>
> However, if we're flooding socket1 with sends, the following could
> also result from the same sequence:
>
> 1) App queues sendA for socket1
> 2) App queues sendB for socket1
> 3) App does io_uring_submit()
> 4) sendA is issued, socket1 is full, poll is armed for retry
> 5) Space frees up in socket1, this triggers sendA retry via task_work
> 6) sendB is issued, completes successfully, posts CQE
> 7) sendA is retried, completes successfully, posts CQE
>
> Now we've sent sendB before sendA, which can make things unhappy. If
> both sendA and sendB had been using provided buffers, then it would look
> as follows instead:
>
> 1) App queues dataA for sendA, queues sendA for socket1
> 2) App queues dataB for sendB queues sendB for socket1
> 3) App does io_uring_submit()
> 4) sendA is issued, socket1 is full, poll is armed for retry
> 5) Space frees up in socket1, this triggers sendA retry via task_work
> 6) sendB is issued, picks first buffer (dataA), completes successfully,
>    posts CQE (which says "I sent dataA")
> 7) sendA is retried, picks first buffer (dataB), completes successfully,
>    posts CQE (which says "I sent dataB")

Hi Jens,

If I understand correctly, when sending a buffer, we set sr->len to be
the smallest between the buffer size and what was requested in sqe->len.
But, when we disconnect the buffer from the request, we can get in a
situation where the buffers and requests mismatch,  and only one buffer
gets sent.

Say we are sending two buffers through non-bundle sends with different
sizes to the same socket in this order:

 buff[1]->len = 128
 buff[2]->len = 256

And SQEs like this:

 sqe[1]->len = 128
 sqe[2]->len = 256

If sqe1 picks buff1 it is all good. But, if sqe[2] runs first, then
sqe[1] picks buff2, and it will only send the first 128, won't it?
Looking at the patch I don't see how you avoid this condition, but
perhaps I'm missing something?

One suggestion would be requiring sqe->len to be 0 when using send with
provided buffers, so we simply use the entire buffer in
the ring.  wdyt?

Thanks,

-- 
Gabriel Krisman Bertazi