> On 16 Oct 2024, at 03:20, Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > On Tue, Oct 15, 2024 at 06:06:01PM +0200, Kevin Wolf wrote: >> Am 15.10.2024 um 14:59 hat Ming Lei geschrieben: >>> On Tue, Oct 15, 2024 at 08:15:17PM +0800, Ming Lei wrote: >>>> On Tue, Oct 15, 2024 at 8:11 PM Ming Lei <ming.lei@xxxxxxxxxx> wrote: >>>>> >>>>> On Tue, Oct 15, 2024 at 08:01:43PM +0800, Ming Lei wrote: >>>>>> On Tue, Oct 15, 2024 at 6:22 PM Kevin Wolf <kwolf@xxxxxxxxxx> wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> the other day I was running some benchmarks to compare different QEMU >>>>>>> block exports, and one of the scenarios I was interested in was >>>>>>> exporting NBD from qemu-storage-daemon over a unix socket and attaching >>>>>>> it as a block device using the kernel NBD client. I would then run a VM >>>>>>> on top of it and fio inside of it. >>>>>>> >>>>>>> Unfortunately, I couldn't get any numbers because the connection always >>>>>>> aborted with messages like "Double reply on req ..." or "Unexpected >>>>>>> reply ..." in the host kernel log. >>>>>>> >>>>>>> Yesterday I found some time to have a closer look why this is happening, >>>>>>> and I think I have a rough understanding of what's going on now. Look at >>>>>>> these trace events: >>>>>>> >>>>>>> qemu-img-51025 [005] ..... 19503.285423: nbd_header_sent: nbd transport event: request 000000002df03708, handle 0x0000150c0000005a >>>>>>> [...] >>>>>>> qemu-img-51025 [008] ..... 19503.285500: nbd_payload_sent: nbd transport event: request 000000002df03708, handle 0x0000150c0000005d >>>>>>> [...] >>>>>>> kworker/u49:1-47350 [004] ..... 19503.285514: nbd_header_received: nbd transport event: request 00000000b79e7443, handle 0x0000150c0000005a >>>>>>> >>>>>>> This is the same request, but the handle has changed between >>>>>>> nbd_header_sent and nbd_payload_sent! I think this means that we hit one >>>>>>> of the cases where the request is requeued, and then the next time it >>>>>>> is executed with a different blk-mq tag, which is something the nbd >>>>>>> driver doesn't seem to expect. >>>>>>> >>>>>>> Of course, since the cookie is transmitted in the header, the server >>>>>>> replies with the original handle that contains the tag from the first >>>>>>> call, while the kernel is only waiting for a handle with the new tag and >>>>>>> is confused by the server response. >>>>>>> >>>>>>> I'm not sure yet which of the following options should be considered the >>>>>>> real problem here, so I'm only describing the situation without trying >>>>>>> to provide a patch: >>>>>>> >>>>>>> 1. Is it that blk-mq should always re-run the request with the same tag? >>>>>>> I don't expect so, though in practice I was surprised to see that it >>>>>>> happens quite often after nbd requeues a request that it actually >>>>>>> does end up with the same cookie again. >>>>>> >>>>>> No. >>>>>> >>>>>> request->tag will change, but we may take ->internal_tag(sched) or >>>>>> ->tag(none), which won't change. >>>>>> >>>>>> I guess was_interrupted() in nbd_send_cmd() is triggered, then the payload >>>>>> is sent with a different tag. >>>>>> >>>>>> I will try to cook one patch soon. >>>>> >>>>> Please try the following patch: >>>> >>>> Oops, please ignore the patch, it can't work since >>>> nbd_handle_reply() doesn't know static tag. >>> >>> Please try the v2: >> >> It doesn't fully work, though it replaced the bug with a different one. >> Now I get "Unexpected request" for the final flush request. > > That just shows the approach is working. > > Flush request doesn't have static tag, that is why it is failed. > It shouldn't be hard to cover it, please try the attached & revised > patch. > > Another solution is to add per-nbd-device map for retrieving nbd_cmd > by the stored `index` in cookie, and the cost is one such array for > each device. > >> >> Anyway, before talking about specific patches, would this even be the >> right solution or would it only paper over a bigger issue? >> >> Is getting a different tag the only thing that can go wrong if you >> handle a request partially and then requeue it? > > Strictly speaking it is BLK_STS_RESOURCE. > > Not like userspace implementation, kernel nbd call one sock_sendmsg() > for sending either request header, or each single data bvec, so partial xmit > can't be avoided. This kind of handling is fine, given TCP is just > byte stream, nothing difference is observed from nbd server side if > data is correct. > > > Thanks, > Ming > <static-tag.patch>