On Mon, Sep 25, 2023 at 09:48:38AM +0200, Christoph Hellwig wrote: > On Wed, Sep 20, 2023 at 03:41:11PM -0500, Samuel Holland wrote: > > [ 14.619101] Buffer I/O error on dev nbd0, logical block 0, async page read > > > > [ 14.630490] nbd0: unable to read partition table > > > > I wonder if disk_force_media_change() is the right thing to call here instead. > > So what are the semantics of clearing the socket? > > The <= 6.5 behavior of invalidating fs caches, but not actually marking > the fs shutdown is pretty broken, especially if this expects to resurrect > the device and thus the file system later on. nbd-client -d calls ioctl(nbd, NBD_DISCONNECT); ioctl(nbd, NBD_CLEAR_SOCK); (error handling removed for clarity) where "nbd" is the file handle to the nbd device. This expects that the device is cleared and that then the device can be reused for a different connection, much like "losetup -d". Expecting that the next connection would talk to the same file system is wrong. In netlink mode, it obviously doesn't use the ioctl()s, but instead sends an NBD_CMD_DISCONNECT command, without any NBD_CLEAR_SOCK, for which no equivalent message exists. At this point, obviously the same result is expected in userspace, i.e., the device should now be available for the next connection that may or may not be the same one. nbd-client also has "-persist" option that used to work. This does expect to resurrect the device and file system. It depends on semantics where the kernel would block IO to the device until the nbd-client process that initiated the connection exits, thus allowing it to re-establish the connection if possible. When doing this, we don't issue a DISCONNECT or CLEAR_SOCK message and obviously the client is expected to re-establish a connection to the same device, thus some state should be retained. These semantics have however been broken at some point over the past decade or so, but I didn't notice that at the time, so I didn't complain, and it's therefore probably not relevant anymore. We should perhaps rethink whether this is still a good idea given the way the netlink mode does not have a client waiting for a return of the ioctl() call, and if so how to implement a replacement. Kind regards, -- w@uter.{be,co.za} wouter@{grep.be,fosdem.org,debian.org} I will have a Tin-Actinium-Potassium mixture, thanks.