Jens Axboe <axboe@xxxxxxxxx> writes: > It's pretty trivial to wire up provided buffer support for the send > side, just like how it's done the receive side. This enables setting up > a buffer ring that an application can use to push pending sends to, > and then have a send pick a buffer from that ring. > > One of the challenges with async IO and networking sends is that you > can get into reordering conditions if you have more than one inflight > at the same time. Consider the following scenario where everything is > fine: > > 1) App queues sendA for socket1 > 2) App queues sendB for socket1 > 3) App does io_uring_submit() > 4) sendA is issued, completes successfully, posts CQE > 5) sendB is issued, completes successfully, posts CQE > > All is fine. Requests are always issued in-order, and both complete > inline as most sends do. > > However, if we're flooding socket1 with sends, the following could > also result from the same sequence: > > 1) App queues sendA for socket1 > 2) App queues sendB for socket1 > 3) App does io_uring_submit() > 4) sendA is issued, socket1 is full, poll is armed for retry > 5) Space frees up in socket1, this triggers sendA retry via task_work > 6) sendB is issued, completes successfully, posts CQE > 7) sendA is retried, completes successfully, posts CQE > > Now we've sent sendB before sendA, which can make things unhappy. If > both sendA and sendB had been using provided buffers, then it would look > as follows instead: > > 1) App queues dataA for sendA, queues sendA for socket1 > 2) App queues dataB for sendB queues sendB for socket1 > 3) App does io_uring_submit() > 4) sendA is issued, socket1 is full, poll is armed for retry > 5) Space frees up in socket1, this triggers sendA retry via task_work > 6) sendB is issued, picks first buffer (dataA), completes successfully, > posts CQE (which says "I sent dataA") > 7) sendA is retried, picks first buffer (dataB), completes successfully, > posts CQE (which says "I sent dataB") Hi Jens, If I understand correctly, when sending a buffer, we set sr->len to be the smallest between the buffer size and what was requested in sqe->len. But, when we disconnect the buffer from the request, we can get in a situation where the buffers and requests mismatch, and only one buffer gets sent. Say we are sending two buffers through non-bundle sends with different sizes to the same socket in this order: buff[1]->len = 128 buff[2]->len = 256 And SQEs like this: sqe[1]->len = 128 sqe[2]->len = 256 If sqe1 picks buff1 it is all good. But, if sqe[2] runs first, then sqe[1] picks buff2, and it will only send the first 128, won't it? Looking at the patch I don't see how you avoid this condition, but perhaps I'm missing something? One suggestion would be requiring sqe->len to be 0 when using send with provided buffers, so we simply use the entire buffer in the ring. wdyt? Thanks, -- Gabriel Krisman Bertazi