On Fri, 2023-01-27 at 16:17 +0000, Chuck Lever III wrote: > > > On Jan 27, 2023, at 11:12 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > On Fri, 2023-01-27 at 15:35 +0000, Chuck Lever III wrote: > > > > > > > On Jan 27, 2023, at 10:30 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > > > On Fri, 2023-01-27 at 15:21 +0000, Chuck Lever III wrote: > > > > > > > > > > > On Jan 27, 2023, at 7:09 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > > > > > > > We had a bug report that xfstest generic/355 was failing on NFSv4.0. > > > > > > This test sets various combinations of setuid/setgid modes and tests > > > > > > whether DIO writes will cause them to be stripped. > > > > > > > > > > > > What I found was that the server did properly strip those bits, but > > > > > > the client didn't notice because it held a delegation that was not > > > > > > recalled. The recall didn't occur because the client itself was the > > > > > > one generating the activity and we avoid recalls in that case. > > > > > > > > > > > > Clearing setuid bits is an "implicit" activity. The client didn't > > > > > > specifically request that we do that, so we need the server to issue a > > > > > > CB_RECALL, or avoid the situation entirely by not issuing a delegation. > > > > > > > > > > > > The easiest fix here is to simply not give out a delegation if the file > > > > > > is being opened for write, and the mode has the setuid and/or setgid bit > > > > > > set. Note that there is a potential race between the mode and lease > > > > > > being set, so we test for this condition both before and after setting > > > > > > the lease. > > > > > > > > > > > > This patch fixes generic/355, generic/683 and generic/684 for me. > > > > > > > > > > generic/355 2s ... 1s > > > > > > > > > > > > > I should note that 355 only fails with vers=4.0. On 4.1+ the client > > > > specifies that it doesn't want a delegation (as this test is doing DIO). > > > > > > I used a NFSv4.0 mount for the test. > > > > > > > > > > > That's good. > > > > > > > > > > generic/683 2s ... [not run] xfs_io falloc failed (old kernel/wrong fs?) > > > > > generic/684 2s ... [not run] xfs_io fpunch failed (old kernel/wrong fs?) > > > > > > > > > > What am I doing wrong? > > > > > > > > > > > > > > > > > > Not sure here. This test requires v4.2, but the client and server should > > > > negotiate that. You might need to run the test by hand and see what it > > > > outputs. i.e.: > > > > > > > > $ sudo ./tests/generic/683 > > > > > > Then, these two failed only for NFSv4.2 and are not run for other > > > minor versions. For some reason I thought this was an NFSv4.0-only > > > bug. > > > > > > > Ok, that's expected. I should have been more clear. 355 only fails on > > v4.0 only, but 683 and 684 require v4.2 to run and fail. > > I'll add that clarification to the patch description. > Thanks! > > > The cause of all 3 failures are the same though, and this patch should > > fix it. > > Since this regression has been around for a bit, I plan to apply > the fix to nfsd-next. Anyone who wants it backported to stable can > apply it and test it -- I'm going to leave off a Cc: stable or Fixes:. > I think that sounds reasonable. Setuid files are pretty rare overall, and this only affects the writing client's perception of them. It's hard to nail down a Fixes: for this. I suspect the regression came about once we stopped recalling delegations for activity by the same client, but I haven't tested that theory. -- Jeff Layton <jlayton@xxxxxxxxxx>