Re: xfstests results over NFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 8/17/23 2:07 PM, Jeff Layton wrote:
On Thu, 2023-08-17 at 13:15 -0400, Jeff Layton wrote:
On Thu, 2023-08-17 at 16:31 +0000, Chuck Lever III wrote:
On Aug 17, 2023, at 12:27 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:

On Thu, 2023-08-17 at 11:17 -0400, Anna Schumaker wrote:
On Thu, Aug 17, 2023 at 10:22 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
On Thu, 2023-08-17 at 14:04 +0000, Chuck Lever III wrote:
On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:

I finally got my kdevops (https://github.com/linux-kdevops/kdevops) test
rig working well enough to get some publishable results. To run fstests,
kdevops will spin up a server and (in this case) 2 clients to run
xfstests' auto group. One client mounts with default options, and the
other uses NFSv3.

I tested 3 kernels:

v6.4.0 (stock release)
6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of yesterday morning)

Here are the results summary of all 3:

KERNEL:    6.4.0
CPUS:      8

nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
Failures: generic/053 generic/099 generic/105 generic/124
   generic/193 generic/258 generic/294 generic/318 generic/319
   generic/444 generic/528 generic/529
nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
Failures: generic/053 generic/099 generic/105 generic/186
   generic/187 generic/193 generic/294 generic/318 generic/319
   generic/357 generic/444 generic/486 generic/513 generic/528
   generic/529 generic/578 generic/675 generic/688
Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s

KERNEL:    6.5.0-rc6-g4853c74bd7ab
CPUS:      8

nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
Failures: generic/053 generic/099 generic/105 generic/258
   generic/294 generic/318 generic/319 generic/444 generic/529
nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
Failures: generic/053 generic/099 generic/105 generic/186
   generic/187 generic/294 generic/318 generic/319 generic/357
   generic/444 generic/486 generic/513 generic/529 generic/578
   generic/675 generic/688
Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s

KERNEL:    6.5.0-rc6-next-20230816-gef66bf8aeb91
CPUS:      8

nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
Failures: generic/053 generic/099 generic/105 generic/258
   generic/294 generic/318 generic/319 generic/444 generic/529
nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
Failures: generic/053 generic/099 generic/105 generic/186
   generic/187 generic/294 generic/318 generic/319 generic/357
   generic/444 generic/486 generic/513 generic/529 generic/578
   generic/675 generic/683 generic/684 generic/688
Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s
As long as we're sharing results ... here is what I'm seeing with a
6.5-rc6 client & server:

anna@gouda ~ % xfstestsdb xunit list --results --runid 1741 --color=none
+------+----------------------+---------+----------+------+------+------+-------+
run | device               | xunit   | hostname | pass | fail |
skip |  time |
+------+----------------------+---------+----------+------+------+------+-------+
1741 | server:/srv/xfs/test | tcp-3   | client   |  125 |    4 |
464 | 447 s |
1741 | server:/srv/xfs/test | tcp-4.0 | client   |  117 |   11 |
465 | 478 s |
1741 | server:/srv/xfs/test | tcp-4.1 | client   |  119 |   12 |
462 | 404 s |
1741 | server:/srv/xfs/test | tcp-4.2 | client   |  212 |   18 |
363 | 564 s |
+------+----------------------+---------+----------+------+------+------+-------+

anna@gouda ~ % xfstestsdb show --failure 1741 --color=none
+-------------+---------+---------+---------+---------+
   testcase | tcp-3   | tcp-4.0 | tcp-4.1 | tcp-4.2 |
+-------------+---------+---------+---------+---------+
generic/053 | passed  | failure | failure | failure |
generic/099 | passed  | failure | failure | failure |
generic/105 | passed  | failure | failure | failure |
generic/140 | skipped | skipped | skipped | failure |
generic/188 | skipped | skipped | skipped | failure |
generic/258 | failure | passed  | passed  | failure |
generic/294 | failure | failure | failure | failure |
generic/318 | passed  | failure | failure | failure |
generic/319 | passed  | failure | failure | failure |
generic/357 | skipped | skipped | skipped | failure |
generic/444 | failure | failure | failure | failure |
generic/465 | passed  | failure | failure | failure |
generic/513 | skipped | skipped | skipped | failure |
generic/529 | passed  | failure | failure | failure |
generic/604 | passed  | passed  | failure | passed  |
generic/675 | skipped | skipped | skipped | failure |
generic/688 | skipped | skipped | skipped | failure |
generic/697 | passed  | failure | failure | failure |
    nfs/002 | failure | failure | failure | failure |
+-------------+---------+---------+---------+---------+


With NFSv4.2, v6.4.0 has 2 extra failures that the current mainline
kernel doesn't:

   generic/193 (some sort of setattr problem)
   generic/528 (known problem with btime handling in client that has been fixed)

While I haven't investigated, I'm assuming the 193 bug is also something
that has been fixed in recent kernels. There are also 3 other NFSv3
tests that started passing since v6.4.0. I haven't looked into those.

With the linux-next kernel there are 2 new regressions:

   generic/683
   generic/684

Both of these look like problems with setuid/setgid stripping, and still
need to be investigated. I have more verbose result info on the test
failures if anyone is interested.
Interesting that I'm not seeing the 683 & 684 failures. What type of
filesystem is your server exporting?

btrfs

You are testing linux-next? I need to go back and confirm these results
too.
IMO linux-next is quite important : we keep hitting bugs that
appear only after integration -- block and network changes in
other trees especially can impact the NFS drivers.

Indeed, I suspect this is probably something from the vfs tree (though
we definitely need to confirm that). Today I'm testing:

     6.5.0-rc6-next-20230817-g47762f086974

Nope, I was wrong. I ran a bisect and it landed here. I confirmed it by
turning off leases on the nfs server and the test started passing. I
probably won't have the cycles to chase this down further.

The capture looks something like this:

OPEN (get a write delegation
WRITE
CLOSE
SETATTR (mode 06666)

...then presumably a task on the client opens the file again, but the
setuid bits don't get stripped.

I think either the client will need to strip these bits on a delegated
open, or we'll need to recall write delegations from the client when it
tries to do a SETATTR with a mode that could later end up needing to be
stripped on a subsequent open:

66ce3e3b98a7a9e970ea463a7f7dc0575c0a244b is the first bad commit
commit 66ce3e3b98a7a9e970ea463a7f7dc0575c0a244b
Author: Dai Ngo <dai.ngo@xxxxxxxxxx>
Date:   Thu Jun 29 18:52:40 2023 -0700

     NFSD: Enable write delegation support

The SETATTR should cause the delegation to be recalled. However, I think
there is an optimization on server that skips the recall if the SETATTR
comes from the same client that has the delegation.

I'll take a look.

Thanks,
-Dai





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux