Re: [BUG REPORT][BLOCK/NBD] Error when accessing qcow2 image through NBD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le 31/05/2024 à 17:17, Josef Bacik a écrit :
On Fri, May 31, 2024 at 1:48 AM Michel LAFON-PUYO <michel@xxxxxxxxxxxxx> wrote:
Hi!


When switching from version 6.8.x to version 6.9.x, I've noticed errors when mounting NBD device:

mount: /tmp/test: can't read superblock on /dev/nbd0.
         dmesg(1) may have more information after failed mount system call.

dmesg shows this kind of messages:

[    5.138056] mount: attempt to access beyond end of device
                 nbd0: rw=4096, sector=2, nr_sectors = 2 limit=0
[    5.138062] EXT4-fs (nbd0): unable to read superblock
[    5.140097] nbd0: detected capacity change from 0 to 1024000

or

[  144.431247] blk_print_req_error: 61 callbacks suppressed
[  144.431250] I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 0
[  144.431254] buffer_io_error: 66 callbacks suppressed
[  144.431255] Buffer I/O error on dev nbd0, logical block 0, async page read
[  144.431258] Buffer I/O error on dev nbd0, logical block 1, async page read
[  144.431259] Buffer I/O error on dev nbd0, logical block 2, async page read
[  144.431260] Buffer I/O error on dev nbd0, logical block 3, async page read
[  144.431273] I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[  144.431275] Buffer I/O error on dev nbd0, logical block 0, async page read
[  144.431278] I/O error, dev nbd0, sector 2 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[  144.431279] Buffer I/O error on dev nbd0, logical block 1, async page read
[  144.431282] I/O error, dev nbd0, sector 4 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[  144.431283] Buffer I/O error on dev nbd0, logical block 2, async page read
[  144.431286] I/O error, dev nbd0, sector 6 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[  144.431287] Buffer I/O error on dev nbd0, logical block 3, async page read
[  144.431289]  nbd0: unable to read partition table
[  144.435144] I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[  144.435154] Buffer I/O error on dev nbd0, logical block 0, async page read
[  144.435161] I/O error, dev nbd0, sector 2 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[  144.435166] Buffer I/O error on dev nbd0, logical block 1, async page read
[  144.435170] I/O error, dev nbd0, sector 4 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[  144.436007] I/O error, dev nbd0, sector 6 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[  144.436023] I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[  144.436034]  nbd0: unable to read partition table
[  144.437036]  nbd0: unable to read partition table
[  144.438712]  nbd0: unable to read partition table

It can be reproduced on v6.10-rc1.

I've bisected the commits between v6.8 tag and v6.9 tag on vanilla master branch and found out that commit 242a49e5c8784e93a99e4dc4277b28a8ba85eac5 seems to introduce this regression. When reverting this commit, everything seems fine.

There is only one change in this commit in drivers/block/nbd.c.

-static int nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
+static int __nbd_set_size(struct nbd_device *nbd, loff_t bytesize,

+static int nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
+               loff_t blksize)
+{
+       int error;
+
+       blk_mq_freeze_queue(nbd->disk->queue);
+       error = __nbd_set_size(nbd, bytesize, blksize);
+       blk_mq_unfreeze_queue(nbd->disk->queue);
+
+       return error;
+}
+

To reproduce the issue, you need qemu-img and qemu-nbd. Executing the following script (as root) triggers the issue. This is not systematic but running the script once or twice is generally sufficient to get an error.

qemu-img create -f qcow2 test.img 500M
qemu-nbd -c /dev/nbd0 test.img
mkfs.ext4 /dev/nbd0
qemu-nbd -d /dev/nbd0
mkdir /tmp/test

for i in {1..20} ; do
      qemu-nbd -c /dev/nbd0 test.img
      mount /dev/nbd0 /tmp/test
      umount /dev/nbd0
      qemu-nbd -d /dev/nbd0
      sleep 0.5
done

Output of the script is similar to:

/dev/nbd0 disconnected
/dev/nbd0 disconnected
/dev/nbd0 disconnected
/dev/nbd0 disconnected
/dev/nbd0 disconnected
/dev/nbd0 disconnected
/dev/nbd0 disconnected
mount: /tmp/test: can't read superblock on /dev/nbd0.
         dmesg(1) may have more information after failed mount system call.

Can you please have a look at this issue?
I can help at testing patches.

This is just you racing with the connection being ready and the device
being ready and you trying to mount it.  The timing has changed, if
you look at this patch that I added for blk-tests you'll see the sort
of thing that needs to be done

https://github.com/osandov/blktests/commit/698f1a024cb4d69b4b6cd5500b72efa758340d05

A better option for you is to load the module with devices=0, and use
the netlink thing so that the device doesn't show up until it's
actually connected.  This problem exists because historically we used
the device itself to get configured, instead of a control device that
would then add the device once it is ready.  We can't change the old
way, but going forward to avoid this style of problem you'll want to
use nbds_max=0 and then use the netlink interface for configuration,
that'll give you a much more "normal" experience.  Thanks,

Josef

Hi Josef,

I worked around the issue as you did in blk-tests. Thanks a lot for your support.

Michel





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux