Re: How to safely disconnect NBD device

Josef Bacik <josef@xxxxxxxxxxxxxx> · Sun, 15 May 2022 10:18:26 -0400

On Sun, May 15, 2022 at 7:36 AM Nikolaus Rath <Nikolaus@xxxxxxxx> wrote:
>
> Hi Josef,
>
> Would you be able to help with the question below?
>
> If I understand linux/MAINTAINERS correctly, then you're currently taking core of the NBD kernel-code?
>
> Best,
> -Nikolaus
>
> On Fri, 6 May 2022, at 21:25, Nikolaus Rath wrote:
> > $ nbd-client localhost /dev/nbd1 && mkfs.ext4 /dev/nbd1 && nbd-client -d
> > /dev/nbd1
> >
> > Frequently gives me errors like this:
> >
> > May 02 15:20:50 vostro.rath.org kernel: nbd1: detected capacity change
> > from 0 to 52428800
> > May 02 15:20:50 vostro.rath.org kernel: block nbd1: NBD_DISCONNECT
> > May 02 15:20:50 vostro.rath.org kernel: block nbd1: Disconnected due to
> > user request.
> > May 02 15:20:50 vostro.rath.org kernel: block nbd1: shutting down
> > sockets
> > May 02 15:20:50 vostro.rath.org kernel: I/O error, dev nbd1, sector 776
> > op 0x0:(READ) flags 0x80700 phys_seg 29 prio class 0
> > May 02 15:20:50 vostro.rath.org kernel: I/O error, dev nbd1, sector 776
> > op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> > May 02 15:20:50 vostro.rath.org kernel: Buffer I/O error on dev nbd1,
> > logical block 97, async page read
> > May 02 15:20:50 vostro.rath.org kernel: block nbd1: Attempted send on
> > invalid socket
> > May 02 15:20:50 vostro.rath.org kernel: I/O error, dev nbd1, sector 0
> > op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
> > May 02 15:20:50 vostro.rath.org kernel: block nbd1: Attempted send on
> > invalid socket
> > May 02 15:20:50 vostro.rath.org kernel: I/O error, dev nbd1, sector 0
> > op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
> >
> > To me, this looks as if the kernel is shutting down the NBD connection
> > while there are still active requests and/or while there is still dirty
> > data that needs to be flushed.
> >
> > Is this expected behavior?
> >
> > If so, what is the recommended way to *safely* disconnect an NBD device?
>

Normally this happens because systemd/udev have rules to go and
trigger a scan of devices when they are closed after being opened with
O_EXCL.  mkfs.ext4 should be doing the correct thing and fsync()'ing,
so all of it's stuff should be flushed. the WRITE's are disconcerting,
I'd expect the READ's for sure.  I'd recommend pulling out bpftrace or
something similar to figure out who is issuing WRITE's after the mkfs.

Unfortunately there's nothing for NBD to do here, there's no way for
it to predict what requests may come in.  It should be waiting for all
outstanding requests, but new requests coming in will just get EIO.
Thanks,

Josef