Re: xfstests results over NFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2023-08-22 at 12:51 -0700, dai.ngo@xxxxxxxxxx wrote:
> On 8/22/23 10:02 AM, Jeff Layton wrote:
> > On Tue, 2023-08-22 at 09:07 -0700, dai.ngo@xxxxxxxxxx wrote:
> > > On 8/17/23 4:08 PM, Jeff Layton wrote:
> > > > On Thu, 2023-08-17 at 15:59 -0700, dai.ngo@xxxxxxxxxx wrote:
> > > > > On 8/17/23 3:23 PM, dai.ngo@xxxxxxxxxx wrote:
> > > > > > On 8/17/23 2:07 PM, Jeff Layton wrote:
> > > > > > > On Thu, 2023-08-17 at 13:15 -0400, Jeff Layton wrote:
> > > > > > > > On Thu, 2023-08-17 at 16:31 +0000, Chuck Lever III wrote:
> > > > > > > > > > On Aug 17, 2023, at 12:27 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > > > > > > > > 
> > > > > > > > > > On Thu, 2023-08-17 at 11:17 -0400, Anna Schumaker wrote:
> > > > > > > > > > > On Thu, Aug 17, 2023 at 10:22 AM Jeff Layton <jlayton@xxxxxxxxxx>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Thu, 2023-08-17 at 14:04 +0000, Chuck Lever III wrote:
> > > > > > > > > > > > > > On Aug 17, 2023, at 7:21 AM, Jeff Layton <jlayton@xxxxxxxxxx>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I finally got my kdevops
> > > > > > > > > > > > > > (https://github.com/linux-kdevops/kdevops) test
> > > > > > > > > > > > > > rig working well enough to get some publishable results. To
> > > > > > > > > > > > > > run fstests,
> > > > > > > > > > > > > > kdevops will spin up a server and (in this case) 2 clients to run
> > > > > > > > > > > > > > xfstests' auto group. One client mounts with default options,
> > > > > > > > > > > > > > and the
> > > > > > > > > > > > > > other uses NFSv3.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I tested 3 kernels:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > v6.4.0 (stock release)
> > > > > > > > > > > > > > 6.5.0-rc6-g4853c74bd7ab (Linus' tree as of a couple of days ago)
> > > > > > > > > > > > > > 6.5.0-rc6-next-20230816-gef66bf8aeb91 (linux-next as of
> > > > > > > > > > > > > > yesterday morning)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Here are the results summary of all 3:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > KERNEL:    6.4.0
> > > > > > > > > > > > > > CPUS:      8
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > nfs_v3: 727 tests, 12 failures, 569 skipped, 14863 seconds
> > > > > > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/124
> > > > > > > > > > > > > >      generic/193 generic/258 generic/294 generic/318 generic/319
> > > > > > > > > > > > > >      generic/444 generic/528 generic/529
> > > > > > > > > > > > > > nfs_default: 727 tests, 18 failures, 452 skipped, 21899 seconds
> > > > > > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > > > > > > > > > >      generic/187 generic/193 generic/294 generic/318 generic/319
> > > > > > > > > > > > > >      generic/357 generic/444 generic/486 generic/513 generic/528
> > > > > > > > > > > > > >      generic/529 generic/578 generic/675 generic/688
> > > > > > > > > > > > > > Totals: 1454 tests, 1021 skipped, 30 failures, 0 errors, 35096s
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > KERNEL:    6.5.0-rc6-g4853c74bd7ab
> > > > > > > > > > > > > > CPUS:      8
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > nfs_v3: 727 tests, 9 failures, 570 skipped, 14775 seconds
> > > > > > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/258
> > > > > > > > > > > > > >      generic/294 generic/318 generic/319 generic/444 generic/529
> > > > > > > > > > > > > > nfs_default: 727 tests, 16 failures, 453 skipped, 22326 seconds
> > > > > > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > > > > > > > > > >      generic/187 generic/294 generic/318 generic/319 generic/357
> > > > > > > > > > > > > >      generic/444 generic/486 generic/513 generic/529 generic/578
> > > > > > > > > > > > > >      generic/675 generic/688
> > > > > > > > > > > > > > Totals: 1454 tests, 1023 skipped, 25 failures, 0 errors, 35396s
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > KERNEL:    6.5.0-rc6-next-20230816-gef66bf8aeb91
> > > > > > > > > > > > > > CPUS:      8
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > nfs_v3: 727 tests, 9 failures, 570 skipped, 14657 seconds
> > > > > > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/258
> > > > > > > > > > > > > >      generic/294 generic/318 generic/319 generic/444 generic/529
> > > > > > > > > > > > > > nfs_default: 727 tests, 18 failures, 453 skipped, 21757 seconds
> > > > > > > > > > > > > > Failures: generic/053 generic/099 generic/105 generic/186
> > > > > > > > > > > > > >      generic/187 generic/294 generic/318 generic/319 generic/357
> > > > > > > > > > > > > >      generic/444 generic/486 generic/513 generic/529 generic/578
> > > > > > > > > > > > > >      generic/675 generic/683 generic/684 generic/688
> > > > > > > > > > > > > > Totals: 1454 tests, 1023 skipped, 27 failures, 0 errors, 34870s
> > > > > > > > > > > As long as we're sharing results ... here is what I'm seeing with a
> > > > > > > > > > > 6.5-rc6 client & server:
> > > > > > > > > > > 
> > > > > > > > > > > anna@gouda ~ % xfstestsdb xunit list --results --runid 1741
> > > > > > > > > > > --color=none
> > > > > > > > > > > +------+----------------------+---------+----------+------+------+------+-------+
> > > > > > > > > > > 
> > > > > > > > > > > > run | device               | xunit   | hostname | pass | fail |
> > > > > > > > > > > skip |  time |
> > > > > > > > > > > +------+----------------------+---------+----------+------+------+------+-------+
> > > > > > > > > > > 
> > > > > > > > > > > > 1741 | server:/srv/xfs/test | tcp-3   | client   |  125 |    4 |
> > > > > > > > > > > 464 | 447 s |
> > > > > > > > > > > > 1741 | server:/srv/xfs/test | tcp-4.0 | client   |  117 |   11 |
> > > > > > > > > > > 465 | 478 s |
> > > > > > > > > > > > 1741 | server:/srv/xfs/test | tcp-4.1 | client   |  119 |   12 |
> > > > > > > > > > > 462 | 404 s |
> > > > > > > > > > > > 1741 | server:/srv/xfs/test | tcp-4.2 | client   |  212 |   18 |
> > > > > > > > > > > 363 | 564 s |
> > > > > > > > > > > +------+----------------------+---------+----------+------+------+------+-------+
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > anna@gouda ~ % xfstestsdb show --failure 1741 --color=none
> > > > > > > > > > > +-------------+---------+---------+---------+---------+
> > > > > > > > > > > >      testcase | tcp-3   | tcp-4.0 | tcp-4.1 | tcp-4.2 |
> > > > > > > > > > > +-------------+---------+---------+---------+---------+
> > > > > > > > > > > > generic/053 | passed  | failure | failure | failure |
> > > > > > > > > > > > generic/099 | passed  | failure | failure | failure |
> > > > > > > > > > > > generic/105 | passed  | failure | failure | failure |
> > > > > > > > > > > > generic/140 | skipped | skipped | skipped | failure |
> > > > > > > > > > > > generic/188 | skipped | skipped | skipped | failure |
> > > > > > > > > > > > generic/258 | failure | passed  | passed  | failure |
> > > > > > > > > > > > generic/294 | failure | failure | failure | failure |
> > > > > > > > > > > > generic/318 | passed  | failure | failure | failure |
> > > > > > > > > > > > generic/319 | passed  | failure | failure | failure |
> > > > > > > > > > > > generic/357 | skipped | skipped | skipped | failure |
> > > > > > > > > > > > generic/444 | failure | failure | failure | failure |
> > > > > > > > > > > > generic/465 | passed  | failure | failure | failure |
> > > > > > > > > > > > generic/513 | skipped | skipped | skipped | failure |
> > > > > > > > > > > > generic/529 | passed  | failure | failure | failure |
> > > > > > > > > > > > generic/604 | passed  | passed  | failure | passed  |
> > > > > > > > > > > > generic/675 | skipped | skipped | skipped | failure |
> > > > > > > > > > > > generic/688 | skipped | skipped | skipped | failure |
> > > > > > > > > > > > generic/697 | passed  | failure | failure | failure |
> > > > > > > > > > > >       nfs/002 | failure | failure | failure | failure |
> > > > > > > > > > > +-------------+---------+---------+---------+---------+
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > > > > With NFSv4.2, v6.4.0 has 2 extra failures that the current
> > > > > > > > > > > > > > mainline
> > > > > > > > > > > > > > kernel doesn't:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >      generic/193 (some sort of setattr problem)
> > > > > > > > > > > > > >      generic/528 (known problem with btime handling in client
> > > > > > > > > > > > > > that has been fixed)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > While I haven't investigated, I'm assuming the 193 bug is also
> > > > > > > > > > > > > > something
> > > > > > > > > > > > > > that has been fixed in recent kernels. There are also 3 other
> > > > > > > > > > > > > > NFSv3
> > > > > > > > > > > > > > tests that started passing since v6.4.0. I haven't looked into
> > > > > > > > > > > > > > those.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > With the linux-next kernel there are 2 new regressions:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >      generic/683
> > > > > > > > > > > > > >      generic/684
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Both of these look like problems with setuid/setgid stripping,
> > > > > > > > > > > > > > and still
> > > > > > > > > > > > > > need to be investigated. I have more verbose result info on
> > > > > > > > > > > > > > the test
> > > > > > > > > > > > > > failures if anyone is interested.
> > > > > > > > > > > Interesting that I'm not seeing the 683 & 684 failures. What type of
> > > > > > > > > > > filesystem is your server exporting?
> > > > > > > > > > > 
> > > > > > > > > > btrfs
> > > > > > > > > > 
> > > > > > > > > > You are testing linux-next? I need to go back and confirm these
> > > > > > > > > > results
> > > > > > > > > > too.
> > > > > > > > > IMO linux-next is quite important : we keep hitting bugs that
> > > > > > > > > appear only after integration -- block and network changes in
> > > > > > > > > other trees especially can impact the NFS drivers.
> > > > > > > > > 
> > > > > > > > Indeed, I suspect this is probably something from the vfs tree (though
> > > > > > > > we definitely need to confirm that). Today I'm testing:
> > > > > > > > 
> > > > > > > >        6.5.0-rc6-next-20230817-g47762f086974
> > > > > > > > 
> > > > > > > Nope, I was wrong. I ran a bisect and it landed here. I confirmed it by
> > > > > > > turning off leases on the nfs server and the test started passing. I
> > > > > > > probably won't have the cycles to chase this down further.
> > > > > > > 
> > > > > > > The capture looks something like this:
> > > > > > > 
> > > > > > > OPEN (get a write delegation
> > > > > > > WRITE
> > > > > > > CLOSE
> > > > > > > SETATTR (mode 06666)
> > > > > > > 
> > > > > > > ...then presumably a task on the client opens the file again, but the
> > > > > > > setuid bits don't get stripped.
> > > OPEN (get a write delegation
> > > WRITE
> > > CLOSE
> > > SETATTR (mode 06666)
> > > 
> > > The client continues with:
> > > 
> > > (ALLOCATE,GETATTR)  <<===  this is when the server stripped the SUID and SGID bit
> > > READDIR             ====>  file mode shows 0666  (SUID & SGID were stripped)
> > > READDIR             ====>  file mode shows 0666  (SUID & SGID were stripped)
> > > DELERETURN
> > > 
> > > Here is stack trace of ALLOCATE when the SUID & SGID were stripped:
> > > 
> > > **** start of notify_change, notice the i_mode bits, SUID & SGID were set:
> > > [notify_change]: d_iname[a] ia_valid[0x1a00] ia_mode[0x0] i_mode[0x8db6] [nfsd:2409:Mon Aug 21 23:05:31 2023]
> > >                           KILL[0] KILL_SUID[1] KILL_SGID[1]
> > > 
> > > **** end of notify_change, notice the i_mode bits, SUID & SGID were stripped:
> > > [notify_change]: RET[0] d_iname[a] ia_valid[0x1a01] ia_mode[0x81b6] i_mode[0x81b6] [nfsd:2409:Mon Aug 21 23:05:31 2023]
> > > 
> > > **** stack trace of notify_change comes from ALLOCATE:
> > > Returning from:  0xffffffffb726e764 : notify_change+0x4/0x500 [kernel]
> > > Returning to  :  0xffffffffb726bf99 : __file_remove_privs+0x119/0x170 [kernel]
> > >    0xffffffffb726cfad : file_modified_flags+0x4d/0x110 [kernel]
> > >    0xffffffffc0a2330b : xfs_file_fallocate+0xfb/0x490 [xfs]
> > >    0xffffffffb723e7d8 : vfs_fallocate+0x158/0x380 [kernel]
> > >    0xffffffffc0ddc30a : nfsd4_vfs_fallocate+0x4a/0x70 [nfsd]
> > >    0xffffffffc0def7f2 : nfsd4_allocate+0x72/0xc0 [nfsd]
> > >    0xffffffffc0df2663 : nfsd4_proc_compound+0x3d3/0x730 [nfsd]
> > >    0xffffffffc0dd633b : nfsd_dispatch+0xab/0x1d0 [nfsd]
> > >    0xffffffffc0bda476 : svc_process_common+0x306/0x6e0 [sunrpc]
> > >    0xffffffffc0bdb081 : svc_process+0x131/0x180 [sunrpc]
> > >    0xffffffffc0dd4864 : nfsd+0x84/0xd0 [nfsd]
> > >    0xffffffffb6f0bfd6 : kthread+0xe6/0x120 [kernel]
> > >    0xffffffffb6e587d4 : ret_from_fork+0x34/0x50 [kernel]
> > >    0xffffffffb6e03a3b : ret_from_fork_asm+0x1b/0x30 [kernel]
> > > 
> > > I think the problem here is that the client does not update the file
> > > attribute after ALLOCATE. The GETATTR in the ALLOCATE compound does
> > > not include the mode bits.
> > > 
> > Oh, interesting! Have you tried adding the FATTR4_MODE to that GETATTR
> > call on the client? Does it also fix this?
> 
> Yes, this is what I'm going to try next.
> 

Great. Keep us posted.

> > 
> > > The READDIR's reply show the test file's mode has the SUID & SGID bit
> > > stripped (0666) but apparently these were not used o update the file
> > > attribute.
> > > 
> > > The test passes when server does not grant write delegation because:
> > > 
> > > OPEN
> > > WRITE
> > > CLOSE
> > > SETATTR (06666)
> > > OPEN (CLAIM_FH, NOCREATE)
> > > ALLOCATE        <<=== server clear SUID & SGID
> > > GETATTR, CLOSE  <<=== GETATTR has mode bit as 0666, client updates file attribute
> > > READDIR
> > > READDIR
> > > 
> > > As expected, if the server recalls the write delegation when SETATTR
> > > with SUID/SGID set then the test passes. This is because it forces the
> > > client to send the 2nd OPEN with CLAIM_FH, NOCREATE and then the
> > > (GETATTR, CLOSE) which cause the client to update the file attribute.
> > > 
> > What's your sense of the best way to fix this? The stripping of mode
> > bits isn't covered by the NFSv4 spec, so this will ultimately come down
> > to a judgment call.
> 
> Yes, I did not find anything regarding stripping of SUID/SGID in the NFS4.2
> specs. It's done by the 'fs' layer and it has been there since 4/2005 in
> the big merge to Linux-2.6.12-rc2 done by Linus. So I think we should leave
> it there.
> 
> The stripping makes some sense to me since if the file is being expanded
> (to be written to) then it should not an executable therefor its SUID/SGID
> should be stripped.
> 

Right. The point is that POSIX requires setuid clearing, but the NFSv4
spec doesn't say anything about it. Ultimately, it's the server's
responsibility to actually clear the bits.

Having the client also fetch the mode does sound like the right thing to
do here. It should be cheap for most servers to provide anyway, given
that they will have the inode in-core.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux