Re: [RFC PATCH kdevops 0/2] augment expunge list for v6.1.53

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Frederick!

Nice to see you joining the kdevops gang :)

On Sat, Sep 16, 2023 at 2:49 AM Frederick Lawler <fred@xxxxxxxxxxxxxx> wrote:
>
> In an effort to test and prepare patches from XFS to stable 6.1.y [1], I needed
> to make a baseline for v6.1.53 to verify that the backported patches do not
> introduce regressions (if any). However, after a 'make fstests-baseline', we
> observed that compared to v6.1.42, v6.1.53 introduced more than expected
> expunges to XFS. This RFC is an attempt to put some eyes to this and open up a
> discussion.

I have refreshed the v6.1.42 expunge list very recently to uptodate fstests:

commit 0b58b02f08d26ea23b6ff58d9b24488c266f32d0
Author: Amir Goldstein <amir73il@xxxxxxxxx>
Date:   Sat Aug 12 12:29:57 2023 +0300

    xfs: expunge new failing tests

    After update of fstests branch to tag v2023.08.06

There are zero changes in xfs code between v6.1.42..v6.1.53, so all
the regressions
you observed are unlikely due to the code change.

If it is not easy for you to test on a v6.1.42 k8 host, I can re-run
the baseline loop
with v6.1.53 kernel to verify there are no regressions, but I am
betting there won't be.
So the failures you are seeing must be due to some difference between
our setups.

Note that when I started to use kdepops with libvirt, we have observed
many random
errors that were eventually attributed to faulty code in qemu nvme driver.

I am not ruling out the possibility that the expuge lists that me or
Luis prepared
for xfs in some version (5.10.y, 6.1,y, etc) are tainted with failures
related to
our specific setup.

AFAIK, we never bothered to create two different baselines from scratch in
two different envs (e.g. libvirt and GCE/OCI) and compare them.

But as it is, you already have my baseline from libvirt/kvm -
I don't think that it makes sense to add to 6.1.y expunge lists
failures due to test env change, unless you were able to prove that either:
1. Those tests did not run in my env
2. You env manages to expose a bug that my env did not expose

I can help with #1 by committing results from a run in my env.
#2 is harder - you will need to analyse the failures in your env
and understand them.

Whenever I see new failures, I always analyse them before adding
to the expunge list and I try to add a comment explaining either the
observed reason for failure or the missing fix if I know it.

>
> At Cloudflare, the Linux team does not have an easy way to obtain dedicated and
> easily configurable server infrastructure to execute kdevops filesystem testing,
> but we do have an easily-configurable kubernetes infrastructure. I prepared a
> POC to spin up virtual machines [2] in kubernetes to emulate what terraform
> may do for OpenStack, Azure, AWS, etc... to perform this test. Therefore, the
> configuration option is set to SKIP_BRINGUP=y
>
> In this baseline, I spun up XFS workflow nodes for:
> - xfs_crc
> - xfs_logdev
> - xfs_nocrc
> - xfs_nocrc_512
> - xfs_reflink
> - xfs_reflink_1024
> - xfs_reflink_normapbt
> - xfs_rtdev
>
> Each node is running a vanilla-stable 6.1.y (6.1.53), and the image is based on
> latest Debian SID [3]. Each node also has its own dedicated /data and /media
> partitions to store Linux, fstests, etc... and sparse-images respectfully.
>
> In v6.1.42, we don't currently have expunges for xfs_reflink_normapbt, and
> xfs_reflink. So those are _new_. The rest had significant additions. However,
> not all nodes finished their testing after >12hrs of run time. Some appeared to
> be stuck, in particular xfs_rtdev, and never finished (reason unknown).
> I CTRL+C and ran 'make fstests-results'.
>
> I prepared a fork [4] where the results 6.1.53.xz can be found.
>
> These patches are based on top of commit 0ec98182f4a9 ("bootlinux/fstests:
> remove odd hplip user")
>
> Links:
> 1: https://lore.kernel.org/all/CAOQ4uxgvawD4=4g8BaRiNvyvKN1oreuov_ie6sK6arq3bf8fxw@xxxxxxxxxxxxxx/
> 2: https://kubevirt.io/api-reference/v1.0.0/definitions.html#_v1_virtualmachine
> 3: https://cloud.debian.org/images/cloud/sid/daily/latest/ (debian-sid-genericcloud-amd64-daily.qcow2)
> 4: https://github.com/fredlawl/kdevops/commit/afcb8fe7c4498d2be5386e191db3534f651a3730#diff-0677846133ad9128bf752f674b3c8da437c12ce28f48d8890b9f66d0dcb3717c
>
> Frederick Lawler (2):
>   fstests/xfs: copy 6.1.42 baseline for v6.1.53

In this commit you copied also the ext4 and btrfs expunge lists.
That is not needed as you are not changing or intend to change them.

I don't think that forking xfs lists is going to be needed at all
once you verified what happened - if your findings are indeed
correct they probably belong in the v6.1.42 expunge list.

>   xfs: merge common expunge lists for v6.1.53

The title of this commit does not represent the change correctly.
What this commit does is to add many new tests to the 6.1.53
expunge list.

Your confusing must be from seeing my commits like:
8745d44 xfs: merge common expunge lists for v6.1.42

What these commits do is to merge common failures
in xfs_* config specific expunge lists into the common all.txt
expunge list - there are scripts that do that:
./scripts/workflows/fstests/{find,remove}-common-failures.sh

Thanks,
Amir.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux