> On Sep 4, 2020, at 6:01 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > On Mon, Aug 24, 2020 at 11:42:18AM -0400, Chuck Lever wrote: >> >> >>> On Aug 24, 2020, at 10:22 AM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote: >>> >>> On Mon, Aug 24, 2020 at 09:39:31AM -0400, Chuck Lever wrote: >>>> >>>> >>>>> On Aug 19, 2020, at 5:29 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote: >>>>> >>>>> On Tue, Aug 18, 2020 at 05:26:26PM -0400, Chuck Lever wrote: >>>>>> >>>>>>> On Aug 17, 2020, at 6:20 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote: >>>>>>> >>>>>>> On Sun, Aug 16, 2020 at 04:46:00PM -0400, Chuck Lever wrote: >>>>>>> >>>>>>>> In order of application: >>>>>>>> >>>>>>>> 5920afa3c85f ("nfsd: hook nfsd_commit up to the nfsd_file cache") >>>>>>>> 961.68user 5252.40system 20:12.30elapsed 512%CPU, 2541 DELAY errors >>>>>>>> These results are similar to v5.3. >>>>>>>> >>>>>>>> fd4f83fd7dfb ("nfsd: convert nfs4_file->fi_fds array to use nfsd_files") >>>>>>>> Does not build >>>>>>>> >>>>>>>> eb82dd393744 ("nfsd: convert fi_deleg_file and ls_file fields to nfsd_file") >>>>>>>> 966.92user 5425.47system 33:52.79elapsed 314%CPU, 1330 DELAY errors >>>>>>>> >>>>>>>> Can you take a look and see if there's anything obvious? >>>>>>> >>>>>>> Unfortunately nothing about the file cache code is very obvious to me. >>>>>>> I'm looking at it.... >>>>>>> >>>>>>> It adds some new nfserr_jukebox returns in nfsd_file_acquire. Those >>>>>>> mostly look like kmalloc failures, the one I'm not sure about is the >>>>>>> NFSD_FILE_HASHED check. >>>>>>> >>>>>>> Or maybe it's the lease break there. >>>>>> >>>>>> nfsd_file_acquire() always calls fh_verify() before it invokes nfsd_open(). >>>>>> Replacing nfs4_get_vfs_file's nfsd_open() call with nfsd_file_acquire() adds >>>>>> almost 10 million fh_verify() calls to my test run. >>>>> >>>>> Checking out the code as of fd4f83fd7dfb.... >>>>> >>>>> nfsd_file_acquire() calls nfsd_open_verified(). >>>>> >>>>> And nfsd_open() is basically just fh_verify()+nfsd_open_verified(). >>>>> >>>>> So it doesn't look like the replacement of nfsd_open() by >>>>> nfsd_file_acquire() should have changed the number of fh_verify() calls. >>>> >>>> I see a lot more vfs_setlease() failures after fd4f83fd7dfb. >>>> check_conflicting_open() fails because "inode is open for write": >>>> >>>> 1780 if (arg == F_RDLCK) >>>> 1781 return inode_is_open_for_write(inode) ? -EAGAIN : 0; >>>> >>>> The behavior on the wire is that the server simply doesn't hand out >>>> many delegations. >>>> >>>> NFSv4 OPEN uses nfsd_file_acquire() now, but I don't see CLOSE >>>> releasing the cached file descriptor. Wouldn't that cached >>>> descriptor conflict with subsequent OPENs? >>> >>> Could be, yes. >>> >>> That also reminds me of this patch, did I already send it to you? >> >> I don't have this one. I can try it. > > No difference, I take it? It helps a little, but it doesn't appear to address the regression. It does seem like a reasonable change to take anyway. You could take both of the earlier patches in this thread for v5.10, IMO. > There could also be something wrong with nfsd4_check_conflicting_opens() > that's preventing delegations when it shouldn't. I've added some instrumentation there. It looks like @writes is still positive because of a previous cached open. The NFS request stream goes something like this: OPEN(CREATE) access=WRITE name=yada WRITE CLOSE ... here is where the cached WR_ONLY open remains OPEN(NOCREATE) access=READ name=yada ... conflict detected, no soup for you. I checked, and before the filecache support was added, the server does offer a read delegation in this case. > There might also be some way fh_verify() could be smarter. There's a > big comment there explaining why we repeat the permission checks each > time, but maybe we could keep a flag somewhere that tracks whether we > really need to call nfsd_setuser again. I'll have a look at that more closely once we are clear of this performance regression. The architecture of how fh_verify is used has been around forever, and it seems like a it's a separate issue. I've added some tracepoints that report fh_verify calls and each change to the current FH. Here's a single OPEN call on the server. Note the number of fh_verify calls and how long they take (timestamps are seconds since boot). This is from a single-thread test on the server, so there's no other activity going on. 6398.079629: nfsd_compound: xid=0x59f1891e opcnt=5 6398.079632: nfsd_fh_current: xid=0x59f1891e fh_hash=0x0a393779 6398.079685: nfsd_fh_verify: xid=0x59f1891e fh_hash=0x0a393779 type=NONE access=BYPASS_GSS status=0 6398.079687: nfsd_compound_status: xid=0x59f1891e op=1/5 OP_PUTFH status=0 6398.079689: nfsd_open_args: xid=0x59f1891e seqid=12 type=NOCREATE claim=NULL share=READ name=Makefile 6398.079753: nfsd_fh_verify: xid=0x59f1891e fh_hash=0x0a393779 type=DIR access=EXEC status=0 6398.079789: nfsd_fh_verify: xid=0x59f1891e fh_hash=0x2c049d00 type=REG access=READ|READ_IF_EXEC status=0 6398.079830: nfsd_fh_verify: xid=0x59f1891e fh_hash=0x2c049d00 type=REG access=READ|OWNER_OVERRIDE status=0 6398.079833: nfsd_file_acquire: xid=0x59f1891e hash=0xe28 inode=0xffff88873f4777b0 may_flags=READ ref=2 nf_flags=HASHED|REFERENCED nf_may=READ nf_file=0xffff88872cc53e00 status=0 6398.079868: generic_add_lease: dev=0x0:0x23 ino=0xe42af wcount=0 rcount=1 icount=2 fl_owner=0xffff88871914cfd8 fl_flags=FL_DELEG fl_type=F_RDLCK 6398.079876: locks_get_lock_context: dev=0x0:0x23 ino=0xe42af type=F_RDLCK ctx=0xffff8886d8a43b58 6398.079881: nfsd_deleg_open: client 5f529c2f:b9aedadc stateid 002471d9:00000001 6398.079883: nfsd_deleg_none: client 5f529c2f:b9aedadc stateid 002471d8:00000001 6398.079901: nfsd_fh_current: xid=0x59f1891e fh_hash=0x2c049d00 6398.079904: nfsd_compound_status: xid=0x59f1891e op=2/5 OP_OPEN status=0 6398.079906: nfsd_compound_status: xid=0x59f1891e op=3/5 OP_GETFH status=0 6398.079941: nfsd_fh_verify: xid=0x59f1891e fh_hash=0x2c049d00 type=NONE access= status=0 6398.079947: nfsd_compound_status: xid=0x59f1891e op=4/5 OP_ACCESS status=0 6398.079948: nfsd_get_fattr4: xid=0x59f1891e bm[0]=TYPE|CHANGE|SIZE|FSID|FILEID bm[1]=MODE|NUMLINKS|OWNER|OWNER_GROUP|RAWDEV|SPACE_USED|TIME_ACCESS|TIME_METADATA|TIME_MODIFY|MOUNTED_ON_FILEID bm[2]= 6398.079980: nfsd_fh_verify: xid=0x59f1891e fh_hash=0x2c049d00 type=NONE access= status=0 6398.079985: nfsd_compound_status: xid=0x59f1891e op=5/5 OP_GETATTR status=0 > Based on your and Frank's experiences I'm also sympathetic to the idea > that maybe the filehandle cache just gets in the way in the v4 case. I think it does get in the way for NFSv4. Clearly if there are NFSv3 users, the filecache is a good way to block handing out delegations. For NFSv4, once there are no clients (or local users) using a file, the server should be permitted to offer delegations on it. I don't see any benefit to holding onto file descriptors that clients say explicitly that they are done using. My worry is that after the 5 kernel releases since this code went in, it won't be a simple set of reverts to back out NFSv4 filecache usage. > --b. > >>> Author: J. Bruce Fields <bfields@xxxxxxxxxx> >>> Date: Fri Jul 17 18:54:54 2020 -0400 >>> >>> nfsd: Cache R, RW, and W opens separately >>> >>> The nfsd open code has always kept separate read-only, read-write, and >>> write-only opens as necessary to ensure that when a client closes or >>> downgrades, we don't retain more access than necessary. >>> >>> Honestly, I'm not sure if that's completely necessary, but I'd rather >>> stick to that behavior. >>> >>> Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx> >>> >>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c >>> index 82198d747c4c..4b6f70e0d987 100644 >>> --- a/fs/nfsd/filecache.c >>> +++ b/fs/nfsd/filecache.c >>> @@ -891,7 +891,7 @@ nfsd_file_find_locked(struct inode *inode, unsigned int may_flags, >>> >>> hlist_for_each_entry_rcu(nf, &nfsd_file_hashtbl[hashval].nfb_head, >>> nf_node, lockdep_is_held(&nfsd_file_hashtbl[hashval].nfb_lock)) { >>> - if ((need & nf->nf_may) != need) >>> + if (nf->nf_may != need) >>> continue; >>> if (nf->nf_inode != inode) >>> continue; >> >> -- >> Chuck Lever -- Chuck Lever