On Mon, Oct 23, 2023 at 09:16:52PM +0800, Zorro Lang wrote: > On Sun, Oct 22, 2023 at 09:46:47PM -0700, Darrick J. Wong wrote: > > On Sun, Oct 22, 2023 at 02:18:34PM +0800, Zorro Lang wrote: > > > On Sat, Oct 21, 2023 at 04:00:24PM -0700, Darrick J. Wong wrote: > > > > On Sat, Oct 21, 2023 at 09:14:48PM +0800, Zorro Lang wrote: > > > > > On Thu, Oct 19, 2023 at 07:36:27AM -0700, Darrick J. Wong wrote: > > > > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > > > > > > > Every now and then, this test fails with the following output when > > > > > > running against my development tree when configured with an 8k fs block > > > > > > size: > > > > > > > > > > > > --- a/tests/generic/251.out 2023-07-11 12:18:21.624971186 -0700 > > > > > > +++ b/tests/generic/251.out.bad 2023-10-15 20:54:44.636000000 -0700 > > > > > > @@ -1,2 +1,4677 @@ > > > > > > QA output created by 251 > > > > > > Running the test: done. > > > > > > +fstrim: /opt: FITRIM ioctl failed: Invalid argument > > > > > > +fstrim: /opt: FITRIM ioctl failed: Invalid argument > > > > > > ... > > > > > > +fstrim: /opt: FITRIM ioctl failed: Invalid argument > > > > > > > > > > > > Dumping the exact fstrim command lines to seqres.full produces this at > > > > > > the end: > > > > > > > > > > > > /usr/sbin/fstrim -m 32544k -o 30247k -l 4k /opt > > > > > > /usr/sbin/fstrim -m 32544k -o 30251k -l 4k /opt > > > > > > ... > > > > > > /usr/sbin/fstrim -m 32544k -o 30255k -l 4k /opt > > > > > > > > > > > > The count of failure messages is the same as the count as the "-l 4k" > > > > > > fstrim invocations. Since this is an 8k-block filesystem, the -l > > > > > > parameter is clearly incorrect. The test computes random -m and -l > > > > > > options. > > > > > > > > > > > > Therefore, create helper functions to guess at the minimum and maximum > > > > > > length and minlen parameters that can be used with the fstrim program. > > > > > > In the inner loop of the test, make sure that our choices for -m and -l > > > > > > fall within those constraints. > > > > > > > > > > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > --- > > > > > > > > > > Hi Darrick, with this patch I 100% hit below failure (on default 4k xfs > > > > > and ext4): > > > > > > > > > > # ./check generic/251 > > > > > FSTYP -- xfs (debug) > > > > > PLATFORM -- Linux/x86_64 hp-dl380pg8-01 6.6.0-rc6-mainline+ #7 SMP PREEMPT_DYNAMIC Thu Oct 19 22:34:28 CST 2023 > > > > > MKFS_OPTIONS -- -f /dev/loop0 > > > > > MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/loop0 /mnt/scratch > > > > > > > > > > generic/251 260s ... [failed, exit status 1]- output mismatch (see /root/git/xfstests/results//generic/251.out.bad) > > > > > --- tests/generic/251.out 2022-04-29 23:07:23.263498297 +0800 > > > > > +++ /root/git/xfstests/results//generic/251.out.bad 2023-10-21 21:02:37.687088360 +0800 > > > > > @@ -1,2 +1,5 @@ > > > > > QA output created by 251 > > > > > Running the test: done. > > > > > +5834a5835 > > > > > +> aa60581221897d3d7dd60458e1cca2fa ./results/generic/251.full > > > > > +!!!Checksums has changed - Filesystem possibly corrupted!!!\n > > > > > ... > > > > > (Run 'diff -u /root/git/xfstests/tests/generic/251.out /root/git/xfstests/results//generic/251.out.bad' to see the entire diff) > > > > > > > > Huh. I don't see that on ext4 on my machine. Can you send me all your > > > > > > The failure on ext4: > > > > > > # ./check generic/251 > > > FSTYP -- ext4 > > > PLATFORM -- Linux/x86_64 hp-dl380pg8-01 6.6.0-rc6-mainline+ #7 SMP PREEMPT_DYNAMIC Thu Oct 19 22:34:28 CST 2023 > > > MKFS_OPTIONS -- -F /dev/loop0 > > > MOUNT_OPTIONS -- -o acl,user_xattr -o context=system_u:object_r:root_t:s0 /dev/loop0 /mnt/scratch > > > > > > generic/251 249s ... [failed, exit status 1]- output mismatch (see /root/git/xfstests/results//generic/251.out.bad) > > > --- tests/generic/251.out 2022-04-29 23:07:23.263498297 +0800 > > > +++ /root/git/xfstests/results//generic/251.out.bad 2023-10-22 14:17:07.248059405 +0800 > > > @@ -1,2 +1,5 @@ > > > QA output created by 251 > > > Running the test: done. > > > +5838a5839 > > > +> aa60581221897d3d7dd60458e1cca2fa ./results/generic/251.full > > > +!!!Checksums has changed - Filesystem possibly corrupted!!!\n > > > ... > > > (Run 'diff -u /root/git/xfstests/tests/generic/251.out /root/git/xfstests/results//generic/251.out.bad' to see the entire diff) > > > Ran: generic/251 > > > Failures: generic/251 > > > Failed 1 of 1 tests > > > > > > > /root/git/xfstests/results//generic/251* files so that I can have a > > > > look? > > > > > > Sure, thanks! There're .full and .out.bad files: > > > > > > # cat results/generic/251.full > > > MINLEN max=100000 min=2 > > > LENGTH max=100000 min=4 > > > # cat results/generic/251.out.bad > > > QA output created by 251 > > > Running the test: done. > > > 5833a5834 > > > > aa60581221897d3d7dd60458e1cca2fa ./results/generic/251.full > > > !!!Checksums has changed - Filesystem possibly corrupted!!!\n Hang on, why is $seqres.full being included in the generic/251 integrity checks? *OH* it's this piece that runs before we start the fstrim loop: content=$here ( cd $content find -P . -xdev -type f -print0 | xargs -0 md5sum | sort -o $tmp/content.sums ) If you don't explicitly set RESULT_BASE before running fstests, you get the default setting of: export RESULT_BASE="$here/results/" Hence $seqres.full is really $here/results/generic/251.full. Logging the MINLEN/LENGTH settings to $seqres.full changes the contents of the file, which changes the contents of the directory tree that we repeatedly copy into $SCRATCH_MNT while racing with FSTRIM. That's the cause of the "Checksums has changed" message. The checksums differ because the logging messages I added now result in the test changing the outcome of the test by observing it. I never noticed because my test setup sets RESULT_BASE to a NFS server so that I always get /something/ to look at, even if the node crashes. Not sure what you want to do about this, because this is a subtle bug for test authors to fall into. I guess a mitigation could be to _notrun if $seqres.full is under $here but ... yeughck. Thoughts? --D > > > > > > The SCRATCH_DEV is loop0, its info as below: > > > # xfs_info /dev/loop0 > > > meta-data=/dev/loop0 isize=512 agcount=4, agsize=720896 blks > > > = sectsz=512 attr=2, projid32bit=1 > > > = crc=1 finobt=1, sparse=1, rmapbt=0 > > > = reflink=1 bigtime=1 inobtcount=1 nrext64=0 > > > data = bsize=4096 blocks=2883584, imaxpct=25 > > > = sunit=0 swidth=0 blks > > > naming =version 2 bsize=4096 ascii-ci=0, ftype=1 > > > log =internal log bsize=4096 blocks=16384, version=2 > > > = sectsz=512 sunit=0 blks, lazy-count=1 > > > realtime =none extsz=4096 blocks=0, rtextents=0 > > > > Huh. What filesystem contains the file that /dev/loop0 points to? > > A xfs, but with multi-stripes: > > # xfs_info / > meta-data=/dev/mapper/fedora_hp--dl380pg8--01-root isize=512 agcount=16, agsize=8192000 blks > = sectsz=512 attr=2, projid32bit=1 > = crc=1 finobt=1, sparse=1, rmapbt=0 > = reflink=1 bigtime=1 inobtcount=1 nrext64=0 > data = bsize=4096 blocks=131072000, imaxpct=25 > = sunit=64 swidth=64 blks > naming =version 2 bsize=4096 ascii-ci=0, ftype=1 > log =internal log bsize=4096 blocks=64000, version=2 > = sectsz=512 sunit=64 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > > > > --D > > > > > More other information: > > > # rpm -qf /usr/sbin/fstrim > > > util-linux-2.39.2-1.fc40.x86_64 > > > # uname -r > > > 6.6.0-rc6-mainline+ > > > # rpm -q xfsprogs > > > xfsprogs-6.4.0-1.fc39.x86_64 > > > > > > Thanks, > > > Zorro > > > > > > > > > > > --D > > > > > > > > > Ran: generic/251 > > > > > Failures: generic/251 > > > > > Failed 1 of 1 tests > > > > > > > > > > And test passed without this patch. > > > > > > > > > > # ./check generic/251 > > > > > FSTYP -- xfs (debug) > > > > > PLATFORM -- Linux/x86_64 hp-dl380pg8-01 6.6.0-rc6-mainline+ #7 SMP PREEMPT_DYNAMIC Thu Oct 19 22:34:28 CST 2023 > > > > > MKFS_OPTIONS -- -f /dev/loop0 > > > > > MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/loop0 /mnt/scratch > > > > > > > > > > generic/251 260s ... 249s > > > > > Ran: generic/251 > > > > > Passed all 1 tests > > > > > > > > > > Thanks, > > > > > Zorro > > > > > > > > > > > tests/generic/251 | 59 ++++++++++++++++++++++++++++++++++++++++++++++------- > > > > > > 1 file changed, 51 insertions(+), 8 deletions(-) > > > > > > > > > > > > diff --git a/tests/generic/251 b/tests/generic/251 > > > > > > index 8ee74980cc..40cfd7c381 100755 > > > > > > --- a/tests/generic/251 > > > > > > +++ b/tests/generic/251 > > > > > > @@ -53,14 +53,46 @@ _fail() > > > > > > kill $mypid 2> /dev/null > > > > > > } > > > > > > > > > > > > -_guess_max_minlen() > > > > > > +# Set FSTRIM_{MIN,MAX}_MINLEN to the lower and upper bounds of the -m(inlen) > > > > > > +# parameter to fstrim on the scratch filesystem. > > > > > > +set_minlen_constraints() > > > > > > { > > > > > > - mmlen=100000 > > > > > > - while [ $mmlen -gt 1 ]; do > > > > > > + local mmlen > > > > > > + > > > > > > + for ((mmlen = 100000; mmlen > 0; mmlen /= 2)); do > > > > > > $FSTRIM_PROG -l $(($mmlen*2))k -m ${mmlen}k $SCRATCH_MNT &> /dev/null && break > > > > > > - mmlen=$(($mmlen/2)) > > > > > > done > > > > > > - echo $mmlen > > > > > > + test $mmlen -gt 0 || \ > > > > > > + _notrun "could not determine maximum FSTRIM minlen param" > > > > > > + FSTRIM_MAX_MINLEN=$mmlen > > > > > > + > > > > > > + for ((mmlen = 1; mmlen < FSTRIM_MAX_MINLEN; mmlen *= 2)); do > > > > > > + $FSTRIM_PROG -l $(($mmlen*2))k -m ${mmlen}k $SCRATCH_MNT &> /dev/null && break > > > > > > + done > > > > > > + test $mmlen -le $FSTRIM_MAX_MINLEN || \ > > > > > > + _notrun "could not determine minimum FSTRIM minlen param" > > > > > > + FSTRIM_MIN_MINLEN=$mmlen > > > > > > +} > > > > > > + > > > > > > +# Set FSTRIM_{MIN,MAX}_LEN to the lower and upper bounds of the -l(ength) > > > > > > +# parameter to fstrim on the scratch filesystem. > > > > > > +set_length_constraints() > > > > > > +{ > > > > > > + local mmlen > > > > > > + > > > > > > + for ((mmlen = 100000; mmlen > 0; mmlen /= 2)); do > > > > > > + $FSTRIM_PROG -l ${mmlen}k $SCRATCH_MNT &> /dev/null && break > > > > > > + done > > > > > > + test $mmlen -gt 0 || \ > > > > > > + _notrun "could not determine maximum FSTRIM length param" > > > > > > + FSTRIM_MAX_LEN=$mmlen > > > > > > + > > > > > > + for ((mmlen = 1; mmlen < FSTRIM_MAX_LEN; mmlen *= 2)); do > > > > > > + $FSTRIM_PROG -l ${mmlen}k $SCRATCH_MNT &> /dev/null && break > > > > > > + done > > > > > > + test $mmlen -le $FSTRIM_MAX_LEN || \ > > > > > > + _notrun "could not determine minimum FSTRIM length param" > > > > > > + FSTRIM_MIN_LEN=$mmlen > > > > > > } > > > > > > > > > > > > ## > > > > > > @@ -70,13 +102,24 @@ _guess_max_minlen() > > > > > > ## > > > > > > fstrim_loop() > > > > > > { > > > > > > + set_minlen_constraints > > > > > > + set_length_constraints > > > > > > + echo "MINLEN max=$FSTRIM_MAX_MINLEN min=$FSTRIM_MIN_MINLEN" >> $seqres.full > > > > > > + echo "LENGTH max=$FSTRIM_MAX_LEN min=$FSTRIM_MIN_LEN" >> $seqres.full > > > > > > + > > > > > > trap "_destroy_fstrim; exit \$status" 2 15 > > > > > > fsize=$(_discard_max_offset_kb "$SCRATCH_MNT" "$SCRATCH_DEV") > > > > > > - mmlen=$(_guess_max_minlen) > > > > > > > > > > > > while true ; do > > > > > > - step=$((RANDOM*$RANDOM+4)) > > > > > > - minlen=$(((RANDOM*($RANDOM%2+1))%$mmlen)) > > > > > > + while true; do > > > > > > + step=$((RANDOM*$RANDOM+4)) > > > > > > + test "$step" -ge "$FSTRIM_MIN_LEN" && break > > > > > > + done > > > > > > + while true; do > > > > > > + minlen=$(( (RANDOM * (RANDOM % 2 + 1)) % FSTRIM_MAX_MINLEN )) > > > > > > + test "$minlen" -ge "$FSTRIM_MIN_MINLEN" && break > > > > > > + done > > > > > > + > > > > > > start=$RANDOM > > > > > > if [ $((RANDOM%10)) -gt 7 ]; then > > > > > > $FSTRIM_PROG $SCRATCH_MNT & > > > > > > > > > > > > > > > > > > > > >