On 4/25/24 5:56 AM, Gabriel Krisman Bertazi wrote: > Jens Axboe <axboe@xxxxxxxxx> writes: > >> It's pretty trivial to wire up provided buffer support for the send >> side, just like how it's done the receive side. This enables setting up >> a buffer ring that an application can use to push pending sends to, >> and then have a send pick a buffer from that ring. >> >> One of the challenges with async IO and networking sends is that you >> can get into reordering conditions if you have more than one inflight >> at the same time. Consider the following scenario where everything is >> fine: >> >> 1) App queues sendA for socket1 >> 2) App queues sendB for socket1 >> 3) App does io_uring_submit() >> 4) sendA is issued, completes successfully, posts CQE >> 5) sendB is issued, completes successfully, posts CQE >> >> All is fine. Requests are always issued in-order, and both complete >> inline as most sends do. > > > > > > >> >> However, if we're flooding socket1 with sends, the following could >> also result from the same sequence: >> >> 1) App queues sendA for socket1 >> 2) App queues sendB for socket1 >> 3) App does io_uring_submit() >> 4) sendA is issued, socket1 is full, poll is armed for retry >> 5) Space frees up in socket1, this triggers sendA retry via task_work >> 6) sendB is issued, completes successfully, posts CQE >> 7) sendA is retried, completes successfully, posts CQE >> >> Now we've sent sendB before sendA, which can make things unhappy. If >> both sendA and sendB had been using provided buffers, then it would look >> as follows instead: >> >> 1) App queues dataA for sendA, queues sendA for socket1 >> 2) App queues dataB for sendB queues sendB for socket1 >> 3) App does io_uring_submit() >> 4) sendA is issued, socket1 is full, poll is armed for retry >> 5) Space frees up in socket1, this triggers sendA retry via task_work >> 6) sendB is issued, picks first buffer (dataA), completes successfully, >> posts CQE (which says "I sent dataA") >> 7) sendA is retried, picks first buffer (dataB), completes successfully, >> posts CQE (which says "I sent dataB") > > Hi Jens, > > If I understand correctly, when sending a buffer, we set sr->len to be > the smallest between the buffer size and what was requested in sqe->len. > But, when we disconnect the buffer from the request, we can get in a > situation where the buffers and requests mismatch, and only one buffer > gets sent. > > Say we are sending two buffers through non-bundle sends with different > sizes to the same socket in this order: > > buff[1]->len = 128 > buff[2]->len = 256 > > And SQEs like this: > > sqe[1]->len = 128 > sqe[2]->len = 256 > > If sqe1 picks buff1 it is all good. But, if sqe[2] runs first, then > sqe[1] picks buff2, and it will only send the first 128, won't it? > Looking at the patch I don't see how you avoid this condition, but > perhaps I'm missing something? > > One suggestion would be requiring sqe->len to be 0 when using send with > provided buffers, so we simply use the entire buffer in > the ring. wdyt? It might not hurt to just enforce it to be 0, in fact I think any sane use case would do that and I don't think the above use case is a very valid one. It's a bit of "you get to keep both pieces when it breaks". Do you want to send a patch that just enforces it to be 0? We do have that requirement in other spots for provided buffers and multishot, so I think it'll make sense to do here too regardless of the sanity of the use case. -- Jens Axboe