UNSUBSCRIBE

Simon Fernandez <fernandez.simon@xxxxxxxxx> · Mon, 21 Oct 2024 18:54:16 +0100



> On 16 Oct 2024, at 03:20, Ming Lei <ming.lei@xxxxxxxxxx> wrote:
> 
> On Tue, Oct 15, 2024 at 06:06:01PM +0200, Kevin Wolf wrote:
>> Am 15.10.2024 um 14:59 hat Ming Lei geschrieben:
>>> On Tue, Oct 15, 2024 at 08:15:17PM +0800, Ming Lei wrote:
>>>> On Tue, Oct 15, 2024 at 8:11 PM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
>>>>> 
>>>>> On Tue, Oct 15, 2024 at 08:01:43PM +0800, Ming Lei wrote:
>>>>>> On Tue, Oct 15, 2024 at 6:22 PM Kevin Wolf <kwolf@xxxxxxxxxx> wrote:
>>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> the other day I was running some benchmarks to compare different QEMU
>>>>>>> block exports, and one of the scenarios I was interested in was
>>>>>>> exporting NBD from qemu-storage-daemon over a unix socket and attaching
>>>>>>> it as a block device using the kernel NBD client. I would then run a VM
>>>>>>> on top of it and fio inside of it.
>>>>>>> 
>>>>>>> Unfortunately, I couldn't get any numbers because the connection always
>>>>>>> aborted with messages like "Double reply on req ..." or "Unexpected
>>>>>>> reply ..." in the host kernel log.
>>>>>>> 
>>>>>>> Yesterday I found some time to have a closer look why this is happening,
>>>>>>> and I think I have a rough understanding of what's going on now. Look at
>>>>>>> these trace events:
>>>>>>> 
>>>>>>>        qemu-img-51025   [005] ..... 19503.285423: nbd_header_sent: nbd transport event: request 000000002df03708, handle 0x0000150c0000005a
>>>>>>> [...]
>>>>>>>        qemu-img-51025   [008] ..... 19503.285500: nbd_payload_sent: nbd transport event: request 000000002df03708, handle 0x0000150c0000005d
>>>>>>> [...]
>>>>>>>   kworker/u49:1-47350   [004] ..... 19503.285514: nbd_header_received: nbd transport event: request 00000000b79e7443, handle 0x0000150c0000005a
>>>>>>> 
>>>>>>> This is the same request, but the handle has changed between
>>>>>>> nbd_header_sent and nbd_payload_sent! I think this means that we hit one
>>>>>>> of the cases where the request is requeued, and then the next time it
>>>>>>> is executed with a different blk-mq tag, which is something the nbd
>>>>>>> driver doesn't seem to expect.
>>>>>>> 
>>>>>>> Of course, since the cookie is transmitted in the header, the server
>>>>>>> replies with the original handle that contains the tag from the first
>>>>>>> call, while the kernel is only waiting for a handle with the new tag and
>>>>>>> is confused by the server response.
>>>>>>> 
>>>>>>> I'm not sure yet which of the following options should be considered the
>>>>>>> real problem here, so I'm only describing the situation without trying
>>>>>>> to provide a patch:
>>>>>>> 
>>>>>>> 1. Is it that blk-mq should always re-run the request with the same tag?
>>>>>>>   I don't expect so, though in practice I was surprised to see that it
>>>>>>>   happens quite often after nbd requeues a request that it actually
>>>>>>>   does end up with the same cookie again.
>>>>>> 
>>>>>> No.
>>>>>> 
>>>>>> request->tag will change, but we may take ->internal_tag(sched) or
>>>>>> ->tag(none), which won't change.
>>>>>> 
>>>>>> I guess was_interrupted() in nbd_send_cmd() is triggered, then the payload
>>>>>> is sent with a different tag.
>>>>>> 
>>>>>> I will try to cook one patch soon.
>>>>> 
>>>>> Please try the following patch:
>>>> 
>>>> Oops, please ignore the patch, it can't work since
>>>> nbd_handle_reply() doesn't know static tag.
>>> 
>>> Please try the v2:
>> 
>> It doesn't fully work, though it replaced the bug with a different one.
>> Now I get "Unexpected request" for the final flush request.
> 
> That just shows the approach is working.
> 
> Flush request doesn't have static tag, that is why it is failed.
> It shouldn't be hard to cover it, please try the attached & revised
> patch.
> 
> Another solution is to add per-nbd-device map for retrieving nbd_cmd
> by the stored `index` in cookie, and the cost is one such array for
> each device.
> 
>> 
>> Anyway, before talking about specific patches, would this even be the
>> right solution or would it only paper over a bigger issue?
>> 
>> Is getting a different tag the only thing that can go wrong if you
>> handle a request partially and then requeue it?
> 
> Strictly speaking it is BLK_STS_RESOURCE.
> 
> Not like userspace implementation, kernel nbd call one sock_sendmsg()
> for sending either request header, or each single data bvec, so partial xmit
> can't be avoided. This kind of handling is fine, given TCP is just
> byte stream, nothing difference is observed from nbd server side if
> data is correct.
> 
> 
> Thanks,
> Ming
> <static-tag.patch>