On Wed, Feb 14, 2024 at 7:17 AM David Sterba <dsterba@xxxxxxx> wrote: > > On Tue, Feb 13, 2024 at 09:13:56AM +0800, Anand Jain wrote: > > There are reports that since version 6.7 update-grub fails to find the > > device of the root on systems without initrd and on a single device. > > > > This looks like the device name changed in the output of > > /proc/self/mountinfo: > > > > 6.5-rc5 working > > > > 18 1 0:16 / / rw,noatime - btrfs /dev/sda8 ... > > > > 6.7 not working: > > > > 17 1 0:15 / / rw,noatime - btrfs /dev/root ... > > > > and "update-grub" shows this error: > > > > /usr/sbin/grub-probe: error: cannot find a device for / (is /dev mounted?) > > > > This looks like it's related to the device name, but grub-probe > > recognizes the "/dev/root" path and tries to find the underlying device. > > However there's a special case for some filesystems, for btrfs in > > particular. > > > > The generic root device detection heuristic is not done and it all > > relies on reading the device infos by a btrfs specific ioctl. This ioctl > > returns the device name as it was saved at the time of device scan (in > > this case it's /dev/root). > > > > The change in 6.7 for temp_fsid to allow several single device > > filesystem to exist with the same fsid (and transparently generate a new > > UUID at mount time) was to skip caching/registering such devices. > > > > This also skipped mounted device. One step of scanning is to check if > > the device name hasn't changed, and if yes then update the cached value. > > > > This broke the grub-probe as it always read the device /dev/root and > > couldn't find it in the system. A temporary workaround is to create a > > symlink but this does not survive reboot. > > > > The right fix is to allow updating the device path of a mounted > > filesystem even if this is a single device one. > > > > In the fix, check if the device's major:minor number matches with the > > cached device. If they do, then we can allow the scan to happen so that > > device_list_add() can take care of updating the device path. The file > > descriptor remains unchanged. > > > > This does not affect the temp_fsid feature, the UUID of the mounted > > filesystem remains the same and the matching is based on device major:minor > > which is unique per mounted filesystem. > > > > This covers the path when the device (that exists for all mounted > > devices) name changes, updating /dev/root to /dev/sdx. Any other single > > device with filesystem and is not mounted is still skipped. > > > > Note that if a system is booted and initial mount is done on the > > /dev/root device, this will be the cached name of the device. Only after > > the command "btrfs device scan" it will change as it triggers the > > rename. > > > > The fix was verified by users whose systems were affected. > > > > CC: stable@xxxxxxxxxxxxxxx # 6.7+ > > Fixes: bc27d6f0aa0e ("btrfs: scan but don't register device on single device filesystem") > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=218353 > > Link: https://lore.kernel.org/lkml/CAKLYgeJ1tUuqLcsquwuFqjDXPSJpEiokrWK2gisPKDZLs8Y2TQ@xxxxxxxxxxxxxx/ > > Signed-off-by: Anand Jain <anand.jain@xxxxxxxxxx> > > Tested-by: Alex Romosan <aromosan@xxxxxxxxx> > > Tested-by: CHECK_1234543212345@xxxxxxxxxxxxxx > > Reviewed-by: David Sterba <dsterba@xxxxxxxx> > > When you commit the patch, please reorder the tags according to > https://btrfs.readthedocs.io/en/latest/dev/Developer-s-FAQ.html#ordering So this introduces a regression. Running fstests: (...) btrfs/156 11s ... 16s btrfs/157 3s ... 0s btrfs/158 0s ... 2s btrfs/159 16s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad) --- tests/btrfs/159.out 2020-10-26 15:31:57.061207266 +0000 +++ /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad 2024-02-20 12:54:43.386131546 +0000 @@ -1,8 +1,11 @@ QA output created by 159 +mount: /home/fdmanana/btrfs-tests/scratch_1: wrong fs type, bad option, bad superblock on /dev/mapper/flakey-test, missing codepage or helper program, or other error. + dmesg(1) may have more information after failed mount system call. File digest before power failure: -f049865ed45b1991dc9a299b47d51dbf SCRATCH_MNT/foobar +b2e8facfb4795185fadd85707fe78973 SCRATCH_MNT/foobar +umount: /home/fdmanana/btrfs-tests/scratch_1: not mounted. ... (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/159.out /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad' to see the entire diff) btrfs/160 4s ... 4s (...) The weird thing is it doesn't happen when running btrfs/159 standalone (even if doing a rmmod btrfs before). Instead of running all tests, I managed to reproduce it with only: $ ./check btrfs/14[6-9] btrfs/15[8-9] FSTYP -- btrfs PLATFORM -- Linux/x86_64 debian0 6.8.0-rc5-btrfs-next-151+ #1 SMP PREEMPT_DYNAMIC Mon Feb 19 13:38:37 WET 2024 MKFS_OPTIONS -- /dev/sdc MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1 btrfs/146 1s ... 2s btrfs/147 0s ... 1s btrfs/148 2s ... 2s btrfs/149 1s ... 1s btrfs/158 1s ... 0s btrfs/159 20s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad) --- tests/btrfs/159.out 2020-10-26 15:31:57.061207266 +0000 +++ /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad 2024-02-20 13:51:25.707220763 +0000 @@ -1,8 +1,11 @@ QA output created by 159 +mount: /home/fdmanana/btrfs-tests/scratch_1: wrong fs type, bad option, bad superblock on /dev/mapper/flakey-test, missing codepage or helper program, or other error. + dmesg(1) may have more information after failed mount system call. File digest before power failure: -f049865ed45b1991dc9a299b47d51dbf SCRATCH_MNT/foobar +b2e8facfb4795185fadd85707fe78973 SCRATCH_MNT/foobar +umount: /home/fdmanana/btrfs-tests/scratch_1: not mounted. ... (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/159.out /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad' to see the entire diff) Ran: btrfs/146 btrfs/147 btrfs/148 btrfs/149 btrfs/158 btrfs/159 dmesg shows: [79195.239769] run fstests btrfs/159 at 2024-02-20 14:06:02 [79195.418917] BTRFS: device fsid 45ac9151-7e2d-4dd7-bf75-99967f869f2a devid 1 transid 3747 /dev/sdb scanned by mount (3413231) [79195.419515] BTRFS info (device sdb): first mount of filesystem 45ac9151-7e2d-4dd7-bf75-99967f869f2a [79195.419525] BTRFS info (device sdb): using crc32c (crc32c-intel) checksum algorithm [79195.419529] BTRFS info (device sdb): using free-space-tree [79195.612719] BTRFS: device fsid 10184d7d-3ca9-43c1-a6f8-70b134cff828 devid 1 transid 6 /dev/sdc scanned by mkfs.btrfs (3413318) [79195.666279] BTRFS: device fsid 10184d7d-3ca9-43c1-a6f8-70b134cff828 devid 1 transid 6 /dev/dm-0 scanned by systemd-udevd (3410982) [79195.695774] BTRFS info (device dm-0): first mount of filesystem 10184d7d-3ca9-43c1-a6f8-70b134cff828 [79195.695786] BTRFS info (device dm-0): using crc32c (crc32c-intel) checksum algorithm [79195.695789] BTRFS error (device dm-0): superblock fsid doesn't match fsid of fs_devices: 10184d7d-3ca9-43c1-a6f8-70b134cff828 != 628aff33-4122-4d77-b2a9-2e9a90f27520 [79195.696098] BTRFS error (device dm-0): superblock metadata_uuid doesn't match metadata uuid of fs_devices: 10184d7d-3ca9-43c1-a6f8-70b134cff828 != 628aff33-4122-4d77-b2a9-2e9a90f27520 [79195.696419] BTRFS error (device dm-0): dev_item UUID does not match metadata fsid: 628aff33-4122-4d77-b2a9-2e9a90f27520 != 10184d7d-3ca9-43c1-a6f8-70b134cff828 [79195.696765] BTRFS error (device dm-0): superblock contains fatal errors [79195.697447] BTRFS error (device dm-0): open_ctree failed It always happens with this patch applied in for-next, and never happens with it reverted. >