Re: [External] : Re: [PATCH] ext4: Regression test for corruption during on-line resize

Zorro Lang <zlang@xxxxxxxxxx> · Sat, 14 Sep 2024 12:40:05 +0800

On Thu, Sep 12, 2024 at 05:53:17AM +0000, Srivathsa Dara wrote:
> Hi Zorro,
> 
> > > Regression test for:
> > > 	a6b3bfe176e8 ext4: fix corruption during on-line resize
> > > 
> > > Signed-off-by: Srivathsa Dara <srivathsa.d.dara@xxxxxxxxxx>
> > > ---
> > >  tests/ext4/060     | 43 +++++++++++++++++++++++++++++++++++++++++++
> > >  tests/ext4/060.out |  2 ++
> > >  2 files changed, 45 insertions(+)
> > >  create mode 100755 tests/ext4/060
> > >  create mode 100644 tests/ext4/060.out
> > > 
> > > diff --git a/tests/ext4/060 b/tests/ext4/060 new file mode 100755 
> > > index 00000000..440748ea
> > > --- /dev/null
> > > +++ b/tests/ext4/060
> > > @@ -0,0 +1,43 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (c) 2024 Oracle.  All Rights Reserved.
> > > +#
> > > +# FS QA Test 060
> > > +#
> > > +# This test ensures that kernel avoids FS corruption while online # 
> > > +resizing an ext4 filesystem with disabled resize_inode feature.
> > > +#
> > > +# The commit a6b3bfe176e8 ("ext4: fix corruption during on-line 
> > > +resize") # stops the corruption.
> > > +#
> > > +
> > > +. ./common/preamble
> > > +_begin_fstest auto resize quick
> > > +
> > > +_supported_fs ext4
> > > +_fixed_by_kernel_commit a6b3bfe176e8 \
> > > +	"ext4: fix corruption during on-line resize"
> > > +
> > > +_require_command "$RESIZE2FS_PROG" resize2fs _require_command 
> > > +"$E2FSCK_PROG" e2fsck _require_scratch_size_nocheck $((9* 1024 * 
> > > +1024))
> > > +
> > > +# Initialize an EXT4 filesystem with the resize_inode feature 
> > > +disabled, # and a size of 128MiB less than 8GiB, i.e., short of 1 
> > > +block group in # an 8GiB filesystem.
> > > +
> > > +dev_size=$((8* 1024 * 1024 * 1024 - 128 * 1024 * 1024)) 
> > > +MKFS_OPTIONS="-O ^resize_inode $MKFS_OPTIONS" _scratch_mkfs_sized $dev_size \
> > > +	>>$seqres.full 2>&1
> > 
> > Just for sure, is the 8G fs size a necessary requirement to reproduce the bug?
> > Is it related with the block size (e.g. different blocksize need different fs size for testing)?
> 
> Yes, the issue was that, before the fix, while performing an online
> resize of a filesystem with the resize_inode feature disabled, the
> kernel would corrupt the first block of the last group in the 0th meta_bg.
> 
> If the block size is 1024, each meta_bg has 16 (1024/64) block groups.
> The last blockgroup of the 0th meta_bg is 15, and the block that gets
> corrupted is the 122,880th block, which is the first block of the 15th block group.
> 
> If the block size is 2048, each meta_bg has 32 (2048/64) block groups.
> The last blockgroup of the 0th meta_bg is 31, and the block that gets
> corrupted is the 507,904th block, which is the first block of the 31st block group.
> 
> If the block size is 4096, each meta_bg has 64 (4096/64) block groups.
> The last blockgroup of the 0th meta_bg is 63, and the block that gets
> corrupted is the 2,064,384th block, which is the first block of the 63rd block group.
> 
> The corruption occurred because, when updating backup group descriptors,
> the kernel failed to check whether the group descriptor being updated belonged
> to the meta_bg layout or not, leading to an incorrect block being updated.
> Hence, the corruption.
> 
> To reproduce the issue, the initial filesystem's descriptor block must have
> some available space, and the resize operation should increase the filesystem
> size enough to cross the meta_bg boundary.
> 
> In this case, the initial filesystem size was chosen as 8GiB minus 128MiB,
> so that its descriptor block has space to accommodate an additional block group.
> The filesystem is then resized to 9GiB ( 8GiB is the meta_bg boundary).

Actually I hope to check the MKFS_OPTIONS with you, I saw you try to carry on the
MKFS_OPTIONS, MKFS_OPTIONS="-O ^resize_inode $MKFS_OPTIONS". If
MKFS_OPTIONS="-b 65536" (or others), can this bug still be reproduced? If not, we
can drop the old MKFS_OPTIONS, just set MKFS_OPTIONS="-O ^resize_inode". Or we
can keep it.

> 
> > 
> > > +
> > > +_scratch_mount
> > > +
> > > +# Perform online-resize
> > > +$RESIZE2FS_PROG $SCRATCH_DEV 9G >> $seqres.full 2>&1
> > > +
> > > +$E2FSCK_PROG -fn $SCRATCH_DEV >> $seqres.full 2>&1 || _fail "Filesystem corrupted"
> > 
> > Do you want to test online resize or online fsck or both? (Does ext4 support online fsck?)
> 
> I want to test only the online resize. No, EXT4 doesn't support online fsck.
> However, e2fsck with the -fn option reports corruption even if the device
> is mounted.

As EXT4 doesn't support online fsck, better to unmount before fscking. To avoid other
interference.

Thanks,
Zorro

> 
> Thanks,
> Srivathsa
> 
> > 
> > > +
> > > +echo "Silence is golden"
> > > +
> > > +status=0
> > > +exit
> > > diff --git a/tests/ext4/060.out b/tests/ext4/060.out new file mode 
> > > 100644 index 00000000..8ffce4de
> > > --- /dev/null
> > > +++ b/tests/ext4/060.out
> > > @@ -0,0 +1,2 @@
> > > +QA output created by 060
> > > +Silence is golden
> > > --
> > > 2.39.3
> > > 
> > >
>