Hi Eric, all, > On 8/15/22 4:31 AM, Petr Vorel wrote: > > Hi Dave, > >> On Fri, Aug 12, 2022 at 03:20:37PM +0200, Petr Vorel wrote: > >>> Hi all, > >>> LTP test df01.sh found different size of loop device in v5.19. > >>> Test uses loop device formatted on various file systems, only XFS fails. > >>> It randomly fails during verifying that loop size usage changes: > >>> grep ${TST_DEVICE} output | grep -q "${total}.*${used}" [1] > >>> How to reproduce: > >>> # PATH="/opt/ltp/testcases/bin:$PATH" df01.sh -f xfs # it needs several tries to hit > >>> df saved output: > >>> Filesystem 1024-blocks Used Available Capacity Mounted on > >>> ... > >>> /dev/loop0 256672 16208 240464 7% /tmp/LTP_df01.1kRwoUCCR7/mntpoint > >>> df output: > >>> Filesystem 1024-blocks Used Available Capacity Mounted on > >>> ... > >>> tmpfs 201780 0 201780 0% /run/user/0 > >>> /dev/loop0 256672 15160 241512 6% /tmp/LTP_df01.1kRwoUCCR7/mntpoint > >>> => different size > >>> df01 4 TFAIL: 'df -k -P' failed, not expected. > >> Yup, most likely because we changed something in XFS related to > >> internal block reservation spaces. That is, the test is making > >> fundamentally flawed assumptions about filesystem used space > >> accounting. > >> It is wrong to assuming that the available capacity of a given empty > >> filesystem will never change. Assumptions like this have been > >> invalid for decades because the available space can change based on > >> the underlying configuration or the filesystem. e.g. different > >> versions of mkfs.xfs set different default parameters and so simply > >> changing the version of xfsprogs you use between the two comparision > >> tests will make it fail.... > >> And, well, XFS also has XFS_IOC_{GS}ET_RESBLKS ioctls that allow > >> userspace to change the amount of reserved blocks. They were > >> introduced in 1997, and since then we've changed the default > >> reservation the filesystem takes at least a dozen times. > > Thanks a lot for valuable info. > >>>> It might be a false positive / bug in the test, but it's at least a changed behavior. > >> Yup, any test that assumes "available space" does not change from > >> kernel version to kernel version is flawed. There is no guarantee > >> that this ever stays the same, nor that it needs to stay the same. > > I'm sorry I was not clear. Test [1] does not measure "available space" between > > kernel releases. It just run df command with parameters, saves it's output > > and compares "1024-blocks" and "Used" columns of df output with stat output: > Annotating what these do... > > local total=$(stat -f mntpoint --printf=%b) # number of blocks allocated > > local free=$(stat -f mntpoint --printf=%f) # free blocks in filesystem > > local used=$((total-free)) # (number of blocks free) > > local bsize=$(stat -f mntpoint --printf=%s) # block size ("for faster transfers") > > total=$((($total * $bsize + 512)/ 1024)) # number of 1k blocks allocated? > > used=$((($used * $bsize + 512) / 1024)) # number of 1k blocks used? > > And comparison with "$used" is what sometimes fails. > > BTW this happens on both distros when loop device is on tmpfs. I'm trying to > > trigger it on ext4 and btrfs, not successful so far. Looks like it's tmpfs > > related. > > If that's really expected, we might remove this check for used for XFS > > (not sure if check only for total makes sense). > It's kind of hard to follow this test, but it seems to be trying to > ensure that df output is consistent with du (statfs) numbers, before > and after creating and removing a 1MB file. I guess it's literally > testing df itself, i.e. that it actually presents the numbers it obtained > from statfs. > AFAICT the difference in the failure is 1024 1K blocks, which is the size > the file that's been created and removed during the test. > My best guess is that this is xfs inode deferred inode inactivation hanging > onto the space a little longer than the test expects. > This is probably because the new-ish xfs inode inactivation no longer blocks > on inode garbage collection during statfs(). > IOWS, I think the test expects that free space is reflected in statfs numbers > immediately after a file is removed, and that's no longer the case here. They > change in between the df check and the statfs check. > (The test isn't just checking that the values are correct, it is checking that > the values are /immediately/ correct.) > Putting a "sleep 1" after the "rm -f" in the test seems to fix it; IIRC > the max time to wait for inodegc is 1s. This does slow the test down a bit. Sure, it looks like we can sleep just 50ms on my hw (although better might be to poll for the result [1]), I just wanted to make sure there is no bug/regression before hiding it with sleep. Thanks for your input! Kind regards, Petr [1] https://people.kernel.org/metan/why-sleep-is-almost-never-acceptable-in-tests > -Eric +++ testcases/commands/df/df01.sh @@ -63,6 +63,10 @@ df_test() tst_res TFAIL "'$cmd' failed." fi + if [ "$DF_FS_TYPE" = xfs ]; then + tst_sleep 50ms + fi + ROD_SILENT rm -rf mntpoint/testimg # flush file system buffers, then we can get the actual sizes.