Re: IOSQE_IO_LINK vs. short send of SOCK_STREAM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/13/23 10:51 AM, Jens Axboe wrote:
> On 1/13/23 3:12 AM, Ming Lei wrote:
>> Hello,
>>
>> On Thu, Jan 12, 2023 at 08:35:36AM +0100, Stefan Metzmacher wrote:
>>> Am 12.01.23 um 04:40 schrieb Jens Axboe:
>>>> On 1/11/23 8:27?PM, Ming Lei wrote:
>>>>> Hi Stefan and Jens,
>>>>>
>>>>> Thanks for the help.
>>>>>
>>>>> BTW, the issue is observed when I write ublk-nbd:
>>>>>
>>>>> https://github.com/ming1/ubdsrv/commits/nbd
>>>>>
>>>>> and it isn't completed yet(multiple send sqe chains not serialized
>>>>> yet), the issue is triggered when writing big chunk data to ublk-nbd.
>>>>
>>>> Gotcha
>>>>
>>>>> On Wed, Jan 11, 2023 at 05:32:00PM +0100, Stefan Metzmacher wrote:
>>>>>> Hi Ming,
>>>>>>
>>>>>>> Per my understanding, a short send on SOCK_STREAM should terminate the
>>>>>>> remainder of the SQE chain built by IOSQE_IO_LINK.
>>>>>>>
>>>>>>> But from my observation, this point isn't true when using io_sendmsg or
>>>>>>> io_sendmsg_zc on TCP socket, and the other remainder of the chain still
>>>>>>> can be completed after one short send is found. MSG_WAITALL is off.
>>>>>>
>>>>>> This is due to legacy reasons, you need pass MSG_WAITALL explicitly
>>>>>> in order to a retry or an error on a short write...
>>>>>> It should work for send, sendmsg, sendmsg_zc, recv and recvmsg.
>>>>>
>>>>> Turns out there is another application bug in which recv sqe may cut in the
>>>>> send sqe chain.
>>>>>
>>>>> After the issue is fixed, if MSG_WAITALL is set, short send can't be
>>>>> observed any more. But if MSG_WAITALL isn't set, short send can be
>>>>> observed and the send io chain still won't be terminated.
>>>>
>>>> Right, if MSG_WAITALL is set, then the whole thing will be written. If
>>>> we get a short send, it's retried appropriately. Unless an error occurs,
>>>> it should send the whole thing.
>>>>
>>>>> So if MSG_WAITALL is set, will io_uring be responsible for retry in case
>>>>> of short send, and application needn't to take care of it?
>>>
>>> With new kernels yes, but the application should be prepared to have retry
>>> logic in order to be compatible with older kernels.
>>
>> Now ublk-nbd can be played, mkfs/mount and fio starts to work.
>>
>> But short send still can be observed sometimes when sending nbd write
>> request, which is done by sendmsg(), and the message includes two vectors,
>> (the 1st is the nbd_request, another one is the data to be written to disk).
>>
>> Short send is reported by cqe in which cqe->res is always 28, which is
>> size of 'struct nbd_request', also the length of the 1st io vec. And not
>> see send cqe failure message.
>>
>> And MSG_WAITALL is set for all ublk-nbd io actually.
>>
>> Follows the steps:
>>
>> 1) install liburing 2.0+
>>
>> 2) build ublk & reproduce the issue:
>>
>> - git clone https://github.com/ming1/ubdsrv.git -b nbd
>>
>> - cd ubdsrv
>>
>> - vim build_with_liburing_src && set LIBURING_DIR to your liburing dir
>>
>> - ./build_with_liburing_src&& make -j4
>>
>> 3) run the nbd test
>> - cd ubdsrv
>> - make test T=nbd
>>
>> Sometimes the test hangs, and the following log can be observed
>> in syslog:
>>
>> nbd_send_req_done: short send/receive tag 2 op 1 8000000000800002, len 524316 written 28 cqe flags 0
>> ...
>>
> 
> I can reproduce this, but it's a SEND that ends up being triggered,
> not a SENDMSG. Should the payload carrying op not be a SENDMSG? I'm
> assuming two vecs for that one.

Added some debug and it looks like the request was indeed send up
and is using IORING_OP_SEND and that the 28 is what was requested.
But the completion side seems to think it's a SENDMSG and we should've
received more?

I think this needs a bit of debugging on the userspace side first.

-- 
Jens Axboe





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux