On Sep 11, 2023, at 12:39 PM, Krister Johansen <kjlx@xxxxxxxxxxxxxxxxxx> wrote: > > Invocations of resize2fs intermittently report failure due to superblock > checksum mismatches in this author's environment. This might happen a few > times a week. The following script can make this happen within minutes. > (It assumes /dev/nvme1n1 is available and not in use by anything else). Krister, thanks for submitting the patch. This particular issue was already fixed in commit v1.46.6-16-g43a498e93888, apparently based on your previous report: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o <tytso@xxxxxxx> AuthorDate: Thu Jun 15 00:17:01 2023 -0400 Commit: Theodore Ts'o <tytso@xxxxxxx> CommitDate: Thu Jun 15 00:17:01 2023 -0400 resize2fs: use Direct I/O when reading the superblock for online resizes If the file system is mounted, the superblock can be changing while resize2fs is trying to read the superblock, resulting in checksum failures. One way of avoiding this problem is read the superblock using Direct I/O, since the kernel makes sure that what gets written to disk is self-consistent. Suggested-by: Krister Johansen <kjlx@xxxxxxxxxxxxxxxxxx> Signed-off-by: Theodore Ts'o <tytso@xxxxxxx> So it is landed on the e2fsprogs maint branch, but there has not been a maintenance release since the patch was landed. Cheers, Andreas > #!/usr/bin/bash > set -euxo pipefail > > while true > do > parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s > sleep .5 > mkfs.ext4 /dev/nvme1n1p1 > mount -t ext4 /dev/nvme1n1p1 /mnt > stress-ng --temp-path /mnt -D 4 & > STRESS_PID=$! > sleep 1 > growpart /dev/nvme1n1 1 > resize2fs /dev/nvme1n1p1 > kill $STRESS_PID > wait $STRESS_PID > umount /mnt > wipefs -a /dev/nvme1n1p1 > wipefs -a /dev/nvme1n1 > done > > After trying a few possible solutions, adding an O_DIRECT read to the open > path in resize2fs eliminated the occurrences on test systems. ext2fs_open2 > uses a negative count value when calling io_channel_read_blk to get the > superblock. According to unix_read_block, negative offsets are to be read > direct. However, when strace-ing a program without this fix, the > underlying device was opened without O_DIRECT. Adding the flags in the > patch ensures the device is opend with O_DIRECT and that the superblock > read appears consistent. > > Signed-off-by: Krister Johansen <kjlx@xxxxxxxxxxxxxxxxxx> > --- > v2: > - Only set DIRECT_IO flag when resizing a mounted filesystem. (Feedback from > Theodore Ts'o) > --- > resize/main.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/resize/main.c b/resize/main.c > index 94f5ec6d..f914c050 100644 > --- a/resize/main.c > +++ b/resize/main.c > @@ -409,6 +409,8 @@ int main (int argc, char ** argv) > > if (!(mount_flags & EXT2_MF_MOUNTED) && !print_min_size) > io_flags = EXT2_FLAG_RW | EXT2_FLAG_EXCLUSIVE; > + if (mount_flags & EXT2_MF_MOUNTED) > + io_flags |= EXT2_FLAG_DIRECT_IO; > > io_flags |= EXT2_FLAG_64BITS | EXT2_FLAG_THREADS; > if (undo_file) { > -- > 2.25.1 Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP