Re: Short sends returned in IORING

Jens Axboe <axboe@xxxxxxxxx> · Wed, 4 May 2022 09:28:55 -0600

On 5/4/22 9:21 AM, Constantine Gavrilov wrote:
> On Wed, May 4, 2022 at 4:54 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>
>> On 5/3/22 5:05 PM, Constantine Gavrilov wrote:
>>> Jens:
>>>
>>> This is related to the previous thread "Fix MSG_WAITALL for
>>> IORING_OP_RECV/RECVMSG".
>>>
>>> We have a similar issue with TCP socket sends. I see short sends
>>> regarding of the method (I tried write, writev, send, and sendmsg
>>> opcodes, while using MSG_WAITALL for send and sendmsg). It does not
>>> make a difference.
>>>
>>> Most of the time, sends are not short, and I never saw short sends
>>> with loopback and my app. But on real network media, I see short
>>> sends.
>>>
>>> This is a real problem, since because of this it is not possible to
>>> implement queue size of > 1 on a TCP socket, which limits the benefit
>>> of IORING. When we have a short send, the next send in queue will
>>> "corrupt" the stream.
>>>
>>> Can we have complete send before it completes, unless the socket is
>>> disconnected?
>>
>> I'm guessing that this happens because we get a task_work item queued
>> after we've processed some of the send, but not all. What kernel are you
>> using?
>>
>> This:
>>
>> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.19/io_uring&id=4c3c09439c08b03d9503df0ca4c7619c5842892e
>>
>> is queued up for 5.19, would be worth trying.
>>
>> --
>> Jens Axboe
>>
> 
> Jens:
> 
> Thank you for your reply.
> 
> The kernel is 5.17.4-200.fc35.x86_64. I have looked at the patch. With
> the solution in place, I am wondering whether it will be possible to
> use multiple uring send IOs on the same socket. I expect that Linux
> TCP will serialize multiple send operations on the same socket. I am
> not sure it happens with uring (meaning that socket is blocked for
> processing a new IO until the pending IO completes). Do I need
> IOSQE_IO_DRAIN / IOSQE_IO_LINK for this to work? Would not be optimal
> because of multiple different sockets in the same uring. While I
> already have a workaround in the form of a "software" queue for
> streaming data on TCP sockets, I would rather have kernel to do
> "native" queueing in sockets layer, and have exrtra CPU cycles
> available to the  application.

The patch above will mess with ordering potentially. If the cause is as
I suspect, task_work causing it to think it's signaled, then the better
approach may indeed be to just flush that work and retry without
re-queueing the current one. I can try a patch against 5.18 if you are
willing and able to test?

-- 
Jens Axboe