Re: [systemd-devel] [PATCH] libblkid: fix spurious ext superblock checksum mismatches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 18, 2024 at 11:36:48PM +0100, Lennart Poettering wrote:
> On Mo, 18.11.24 12:35, Krister Johansen (kjlx@xxxxxxxxxxxxxxxxxx) wrote:
> 
> > Reads of ext superblocks can race with updates.  If libblkid observes a
> > checksum mismatch, re-read the superblock with O_DIRECT in order to get
> > a consistent view of its contents.  Only if the O_DIRECT read fails the
> > checksum should it be reported to have failed.
> >
> > This fixes a problem where devices that were named by filesystem label
> > failed to be found when systemd attempted to mount them on boot.  The
> > problem was caused by systemd-udevd using libblkid. If a read of a
> > superblock resulted in a checksum mismatch, udev will remove the
> > by-label links which result in the mount call failing to find the
> > device.  The checksum mismatch that was triggering the problem was
> > spurious, and when we use O_DIRECT, or even perform a subsequent retry,
> > the superblock is correctly read.  This resulted in a failure to mount
> > /boot in one out of every 2,000 or so attempts in our environment.
> >
> > e2fsprogs fixed[1] an identical version of this bug that afflicted
> > resize2fs during online grow operations when run from cloud-init.  The
> > fix there was also to use O_DIRECT in order to read the superblock.
> > This patch uses a similar approach: read the superblock with O_DIRECT in
> > the case where a bad checksum is detected.
> 
> Umpf. udev has a clearly defined protocol to comprehensively avoid
> such issues:
> 
> https://systemd.io/BLOCK_DEVICE_LOCKING
> 
> Partitioning tools should simply follow this logic, and udev and
> programs downstream from it will not even be tempted to operate with
> half-written superblocks, partition tables or such.
> 
> Hence, I personally am not convinced of that O_DIRECT approach. First
> of all, it only works on superblocks that have a useful checksum
> covering enough relevant data, and it can never really catch scenarios
> where a disk is comprehensively repartitioned, i.e. one or more fs and
> partition metadata changed at the same time...

I may have done a poor job of explaining this.  This is ext writing its
own superblock from the kernel, but reads seeing an potentially
inconsistent view of that write.  O_DIRECT causes us to seralize with
the locks ext4 holds when it writes the superblock, which prevents the
read from observing a partial update.

It's not necessarily the partitioning tools causing this, but any
filesystem level udpdate that modifies the contents of the superblock.

-K




[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux