Re: [PATCH] libblkid: fix spurious ext superblock checksum mismatches

Lennart Poettering <lennart@xxxxxxxxxxxxxx> · Mon, 18 Nov 2024 23:36:48 +0100

On Mo, 18.11.24 12:35, Krister Johansen (kjlx@xxxxxxxxxxxxxxxxxx) wrote:

> Reads of ext superblocks can race with updates.  If libblkid observes a
> checksum mismatch, re-read the superblock with O_DIRECT in order to get
> a consistent view of its contents.  Only if the O_DIRECT read fails the
> checksum should it be reported to have failed.
>
> This fixes a problem where devices that were named by filesystem label
> failed to be found when systemd attempted to mount them on boot.  The
> problem was caused by systemd-udevd using libblkid. If a read of a
> superblock resulted in a checksum mismatch, udev will remove the
> by-label links which result in the mount call failing to find the
> device.  The checksum mismatch that was triggering the problem was
> spurious, and when we use O_DIRECT, or even perform a subsequent retry,
> the superblock is correctly read.  This resulted in a failure to mount
> /boot in one out of every 2,000 or so attempts in our environment.
>
> e2fsprogs fixed[1] an identical version of this bug that afflicted
> resize2fs during online grow operations when run from cloud-init.  The
> fix there was also to use O_DIRECT in order to read the superblock.
> This patch uses a similar approach: read the superblock with O_DIRECT in
> the case where a bad checksum is detected.

Umpf. udev has a clearly defined protocol to comprehensively avoid
such issues:

https://systemd.io/BLOCK_DEVICE_LOCKING

Partitioning tools should simply follow this logic, and udev and
programs downstream from it will not even be tempted to operate with
half-written superblocks, partition tables or such.

Hence, I personally am not convinced of that O_DIRECT approach. First
of all, it only works on superblocks that have a useful checksum
covering enough relevant data, and it can never really catch scenarios
where a disk is comprehensively repartitioned, i.e. one or more fs and
partition metadata changed at the same time...

Lennart

--
Lennart Poettering, Berlin