On 1/13/23 10:51 AM, Jens Axboe wrote: > On 1/13/23 3:12 AM, Ming Lei wrote: >> Hello, >> >> On Thu, Jan 12, 2023 at 08:35:36AM +0100, Stefan Metzmacher wrote: >>> Am 12.01.23 um 04:40 schrieb Jens Axboe: >>>> On 1/11/23 8:27?PM, Ming Lei wrote: >>>>> Hi Stefan and Jens, >>>>> >>>>> Thanks for the help. >>>>> >>>>> BTW, the issue is observed when I write ublk-nbd: >>>>> >>>>> https://github.com/ming1/ubdsrv/commits/nbd >>>>> >>>>> and it isn't completed yet(multiple send sqe chains not serialized >>>>> yet), the issue is triggered when writing big chunk data to ublk-nbd. >>>> >>>> Gotcha >>>> >>>>> On Wed, Jan 11, 2023 at 05:32:00PM +0100, Stefan Metzmacher wrote: >>>>>> Hi Ming, >>>>>> >>>>>>> Per my understanding, a short send on SOCK_STREAM should terminate the >>>>>>> remainder of the SQE chain built by IOSQE_IO_LINK. >>>>>>> >>>>>>> But from my observation, this point isn't true when using io_sendmsg or >>>>>>> io_sendmsg_zc on TCP socket, and the other remainder of the chain still >>>>>>> can be completed after one short send is found. MSG_WAITALL is off. >>>>>> >>>>>> This is due to legacy reasons, you need pass MSG_WAITALL explicitly >>>>>> in order to a retry or an error on a short write... >>>>>> It should work for send, sendmsg, sendmsg_zc, recv and recvmsg. >>>>> >>>>> Turns out there is another application bug in which recv sqe may cut in the >>>>> send sqe chain. >>>>> >>>>> After the issue is fixed, if MSG_WAITALL is set, short send can't be >>>>> observed any more. But if MSG_WAITALL isn't set, short send can be >>>>> observed and the send io chain still won't be terminated. >>>> >>>> Right, if MSG_WAITALL is set, then the whole thing will be written. If >>>> we get a short send, it's retried appropriately. Unless an error occurs, >>>> it should send the whole thing. >>>> >>>>> So if MSG_WAITALL is set, will io_uring be responsible for retry in case >>>>> of short send, and application needn't to take care of it? >>> >>> With new kernels yes, but the application should be prepared to have retry >>> logic in order to be compatible with older kernels. >> >> Now ublk-nbd can be played, mkfs/mount and fio starts to work. >> >> But short send still can be observed sometimes when sending nbd write >> request, which is done by sendmsg(), and the message includes two vectors, >> (the 1st is the nbd_request, another one is the data to be written to disk). >> >> Short send is reported by cqe in which cqe->res is always 28, which is >> size of 'struct nbd_request', also the length of the 1st io vec. And not >> see send cqe failure message. >> >> And MSG_WAITALL is set for all ublk-nbd io actually. >> >> Follows the steps: >> >> 1) install liburing 2.0+ >> >> 2) build ublk & reproduce the issue: >> >> - git clone https://github.com/ming1/ubdsrv.git -b nbd >> >> - cd ubdsrv >> >> - vim build_with_liburing_src && set LIBURING_DIR to your liburing dir >> >> - ./build_with_liburing_src&& make -j4 >> >> 3) run the nbd test >> - cd ubdsrv >> - make test T=nbd >> >> Sometimes the test hangs, and the following log can be observed >> in syslog: >> >> nbd_send_req_done: short send/receive tag 2 op 1 8000000000800002, len 524316 written 28 cqe flags 0 >> ... >> > > I can reproduce this, but it's a SEND that ends up being triggered, > not a SENDMSG. Should the payload carrying op not be a SENDMSG? I'm > assuming two vecs for that one. Added some debug and it looks like the request was indeed send up and is using IORING_OP_SEND and that the 28 is what was requested. But the completion side seems to think it's a SENDMSG and we should've received more? I think this needs a bit of debugging on the userspace side first. -- Jens Axboe