Re: LTP test df01.sh detected different size of loop device in v5.19

Petr Vorel <pvorel@xxxxxxx> · Thu, 18 Aug 2022 17:25:33 +0200




Hi Eric, all,

> On 8/15/22 4:31 AM, Petr Vorel wrote:
> > Hi Dave,

> >> On Fri, Aug 12, 2022 at 03:20:37PM +0200, Petr Vorel wrote:
> >>> Hi all,

> >>> LTP test df01.sh found different size of loop device in v5.19.
> >>> Test uses loop device formatted on various file systems, only XFS fails.
> >>> It randomly fails during verifying that loop size usage changes:

> >>> grep ${TST_DEVICE} output | grep -q "${total}.*${used}" [1]

> >>> How to reproduce:
> >>> # PATH="/opt/ltp/testcases/bin:$PATH" df01.sh -f xfs # it needs several tries to hit

> >>> df saved output:
> >>> Filesystem     1024-blocks    Used Available Capacity Mounted on
> >>> ...
> >>> /dev/loop0          256672   16208    240464       7% /tmp/LTP_df01.1kRwoUCCR7/mntpoint
> >>> df output:
> >>> Filesystem     1024-blocks    Used Available Capacity Mounted on
> >>> ...
> >>> tmpfs               201780       0    201780       0% /run/user/0
> >>> /dev/loop0          256672   15160    241512       6% /tmp/LTP_df01.1kRwoUCCR7/mntpoint
> >>> => different size
> >>> df01 4 TFAIL: 'df -k -P' failed, not expected.

> >> Yup, most likely because we changed something in XFS related to
> >> internal block reservation spaces. That is, the test is making
> >> fundamentally flawed assumptions about filesystem used space
> >> accounting.

> >> It is wrong to assuming that the available capacity of a given empty
> >> filesystem will never change.  Assumptions like this have been
> >> invalid for decades because the available space can change based on
> >> the underlying configuration or the filesystem. e.g. different
> >> versions of mkfs.xfs set different default parameters and so simply
> >> changing the version of xfsprogs you use between the two comparision
> >> tests will make it fail....

> >> And, well, XFS also has XFS_IOC_{GS}ET_RESBLKS ioctls that allow
> >> userspace to change the amount of reserved blocks. They were
> >> introduced in 1997, and since then we've changed the default
> >> reservation the filesystem takes at least a dozen times.

> > Thanks a lot for valuable info.

> >>>> It might be a false positive / bug in the test, but it's at least a changed behavior.

> >> Yup, any test that assumes "available space" does not change from
> >> kernel version to kernel version is flawed. There is no guarantee
> >> that this ever stays the same, nor that it needs to stay the same.
> > I'm sorry I was not clear. Test [1] does not measure "available space" between
> > kernel releases. It just run df command with parameters, saves it's output
> > and compares "1024-blocks" and "Used" columns of df output with stat output:

> Annotating what these do...

> > 		local total=$(stat -f mntpoint --printf=%b)  # number of blocks allocated
> > 		local free=$(stat -f mntpoint --printf=%f)   # free blocks in filesystem
> > 		local used=$((total-free))                   # (number of blocks free)

> > 		local bsize=$(stat -f mntpoint --printf=%s)  # block size ("for faster transfers")
> > 		total=$((($total * $bsize + 512)/ 1024))     # number of 1k blocks allocated?
> > 		used=$((($used * $bsize + 512) / 1024))      # number of 1k blocks used?

> > And comparison with "$used" is what sometimes fails.

> > BTW this happens on both distros when loop device is on tmpfs. I'm trying to
> > trigger it on ext4 and btrfs, not successful so far. Looks like it's tmpfs
> > related.

> > If that's really expected, we might remove this check for used for XFS
> > (not sure if check only for total makes sense).

> It's kind of hard to follow this test, but it seems to be trying to
> ensure that df output is consistent with du (statfs) numbers, before
> and after creating and removing a 1MB file.  I guess it's literally
> testing df itself, i.e. that it actually presents the numbers it obtained
> from statfs.

> AFAICT the difference in the failure is 1024 1K blocks, which is the size
> the file that's been created and removed during the test.

> My best guess is that this is xfs inode deferred inode inactivation hanging
> onto the space a little longer than the test expects.

> This is probably because the new-ish xfs inode inactivation no longer blocks
> on inode garbage collection during statfs().

> IOWS, I think the test expects that free space is reflected in statfs numbers
> immediately after a file is removed, and that's no longer the case here. They
> change in between the df check and the statfs check.

> (The test isn't just checking that the values are correct, it is checking that
> the values are /immediately/ correct.)

> Putting a "sleep 1" after the "rm -f" in the test seems to fix it; IIRC
> the max time to wait for inodegc is 1s. This does slow the test down a bit.

Sure, it looks like we can sleep just 50ms on my hw (although better might be to
poll for the result [1]), I just wanted to make sure there is no bug/regression
before hiding it with sleep.

Thanks for your input!

Kind regards,
Petr

[1] https://people.kernel.org/metan/why-sleep-is-almost-never-acceptable-in-tests

> -Eric

+++ testcases/commands/df/df01.sh
@@ -63,6 +63,10 @@ df_test()
 		tst_res TFAIL "'$cmd' failed."
 	fi
 
+	if [ "$DF_FS_TYPE" = xfs ]; then
+		tst_sleep 50ms
+	fi
+
 	ROD_SILENT rm -rf mntpoint/testimg
 
 	# flush file system buffers, then we can get the actual sizes.