On Fri, Apr 22, 2022 at 1:42 AM Matthew Ruffell <matthew.ruffell@xxxxxxxxxxxxx> wrote: > > Dear maintainers of the nbd subsystem, > > A user has come across an issue which causes the nbd module to hang after a > disconnect where a write has been made to a qemu qcow image file, with qemu-nbd > being the server. > Ok there's two problems here, but I want to make sure I have the right fix for the hang first. Can you apply this patch https://paste.centos.org/view/b1a2d01a and make sure the hang goes away? Once that part is fixed I'll fix the IO errors, this is just us racing with systemd while we teardown the device and then we're triggering a partition read while the device is going down and it's complaining loudly. Before we would set_capacity to 0 whenever we disconnected, but that causes problems with file systems that may still have the device open. However now we only do this if the server does the CLEAR_SOCK ioctl, which clearly can race with systemd poking the device, so I need to make it set_capacity(0) when the last opener closes the device to prevent this style of race. Let me know if that patch fixes the hang, and then I'll work up something for the capacity problem. Thanks, Josef