On 5/4/22 9:21 AM, Constantine Gavrilov wrote: > On Wed, May 4, 2022 at 4:54 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> On 5/3/22 5:05 PM, Constantine Gavrilov wrote: >>> Jens: >>> >>> This is related to the previous thread "Fix MSG_WAITALL for >>> IORING_OP_RECV/RECVMSG". >>> >>> We have a similar issue with TCP socket sends. I see short sends >>> regarding of the method (I tried write, writev, send, and sendmsg >>> opcodes, while using MSG_WAITALL for send and sendmsg). It does not >>> make a difference. >>> >>> Most of the time, sends are not short, and I never saw short sends >>> with loopback and my app. But on real network media, I see short >>> sends. >>> >>> This is a real problem, since because of this it is not possible to >>> implement queue size of > 1 on a TCP socket, which limits the benefit >>> of IORING. When we have a short send, the next send in queue will >>> "corrupt" the stream. >>> >>> Can we have complete send before it completes, unless the socket is >>> disconnected? >> >> I'm guessing that this happens because we get a task_work item queued >> after we've processed some of the send, but not all. What kernel are you >> using? >> >> This: >> >> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.19/io_uring&id=4c3c09439c08b03d9503df0ca4c7619c5842892e >> >> is queued up for 5.19, would be worth trying. >> >> -- >> Jens Axboe >> > > Jens: > > Thank you for your reply. > > The kernel is 5.17.4-200.fc35.x86_64. I have looked at the patch. With > the solution in place, I am wondering whether it will be possible to > use multiple uring send IOs on the same socket. I expect that Linux > TCP will serialize multiple send operations on the same socket. I am > not sure it happens with uring (meaning that socket is blocked for > processing a new IO until the pending IO completes). Do I need > IOSQE_IO_DRAIN / IOSQE_IO_LINK for this to work? Would not be optimal > because of multiple different sockets in the same uring. While I > already have a workaround in the form of a "software" queue for > streaming data on TCP sockets, I would rather have kernel to do > "native" queueing in sockets layer, and have exrtra CPU cycles > available to the application. The patch above will mess with ordering potentially. If the cause is as I suspect, task_work causing it to think it's signaled, then the better approach may indeed be to just flush that work and retry without re-queueing the current one. I can try a patch against 5.18 if you are willing and able to test? -- Jens Axboe