Hi Josef, Just a friendly ping, I am more than happy to test a patch, if you send it inline in the email, since the pastebin you used expired after 1 day, and I couldn't access it. I came across and tested Yu Kuai's patches [1][2] which are for the same issue, and they indeed fix the hang. Thank you Yu. [1] nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed https://lists.debian.org/nbd/2022/04/msg00212.html [2] nbd: fix io hung while disconnecting device https://lists.debian.org/nbd/2022/04/msg00207.html I am also happy to test any patches to fix the I/O errors. Thanks, Matthew On Tue, Apr 26, 2022 at 9:47 AM Matthew Ruffell <matthew.ruffell@xxxxxxxxxxxxx> wrote: > > Hi Josef, > > The pastebin has expired the link, and I can't access your patch. > Seems to default to 1 day deletion. > > Could you please create a new paste or send the patch inline in this > email thread? > > I am more than happy to try the patch out. > > Thank you for your analysis. > Matthew > > On Sat, Apr 23, 2022 at 3:24 AM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote: > > > > On Fri, Apr 22, 2022 at 1:42 AM Matthew Ruffell > > <matthew.ruffell@xxxxxxxxxxxxx> wrote: > > > > > > Dear maintainers of the nbd subsystem, > > > > > > A user has come across an issue which causes the nbd module to hang after a > > > disconnect where a write has been made to a qemu qcow image file, with qemu-nbd > > > being the server. > > > > > > > Ok there's two problems here, but I want to make sure I have the right > > fix for the hang first. Can you apply this patch > > > > https://paste.centos.org/view/b1a2d01a > > > > and make sure the hang goes away? Once that part is fixed I'll fix > > the IO errors, this is just us racing with systemd while we teardown > > the device and then we're triggering a partition read while the device > > is going down and it's complaining loudly. Before we would > > set_capacity to 0 whenever we disconnected, but that causes problems > > with file systems that may still have the device open. However now we > > only do this if the server does the CLEAR_SOCK ioctl, which clearly > > can race with systemd poking the device, so I need to make it > > set_capacity(0) when the last opener closes the device to prevent this > > style of race. > > > > Let me know if that patch fixes the hang, and then I'll work up > > something for the capacity problem. Thanks, > > > > Josef