> On Jul 1, 2019, at 11:17 AM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Mon, 2019-07-01 at 11:02 -0400, Chuck Lever wrote: >> Interesting work! Kudos to you and Jeff. >> >> >>> On Jun 30, 2019, at 9:52 AM, Trond Myklebust <trondmy@xxxxxxxxx> >>> wrote: >>> >>> When a NFSv3 READ or WRITE request comes in, the first thing knfsd >>> has >>> to do is open a new file descriptor. While this is often a >>> relatively >>> inexpensive thing to do for most local filesystems, it is usually >>> less >>> so for FUSE, clustered or networked filesystems that are being >>> exported >>> by knfsd. >> >> True, I haven't measured much effect if any of open and close on >> local >> file systems. It would be valuable if the cover letter provided a >> more >> quantified assessment of the cost for these other use cases. It >> sounds >> plausible to me that they would be more expensive, but I'm wondering >> if >> the additional complexity of an open file cache is warranted and >> effective. Do you have any benchmark results to share? >> >> Are there particular workloads where you believe open caching will be >> especially beneficial? > > I'd expect pretty much anything with a nontrivial open() method. i.e.: > FUSE, GFS2, OCFS2, CEPH, etc. to benefit. > > I've seen no slowdowns so far with traditional filesystems: i.e. ext4 > and xfs. > > Note that the removal of the raparms cache does in many way compensate > for the new need to lookup the struct file. > >>> This set of patches attempts to reduce some of that cost by caching >>> open file descriptors so that they may be reused by other incoming >>> READ/WRITE requests for the same file. >> >> Is the open file cache a single cache per server? Wondering if there >> can be significant interference (eg lock contention or cache >> sloshing) >> between separate workloads on different exports, for example. > > The file cache is global. Cache lookups are lockless (i.e. RCU > protected), so there is little contention for the case where there is > already an entry. For the case where we have to add an entry, there is > a mutex that might get contended in the case of workloads with lots of > small file open+closes. >> Do you have any benchmark results that show that removing the raparms >> cache is harmless? > > The same information is carried in struct file. The whole raparms cache > was just a hack in order to allow us to port the readahead information > across struct file instances. Now that we are caching the struct file > itself, the raparms hack is unnecessary. OK. I see the patch description of 11/16 mentions something about "stop fiddling with raparms" but IMO the patch description for 13/16 should be changed to make the above clear. Thanks! > IOW: I haven't seen any slowdowns so far, however I don't have access > to a bleeding edge networking setup that would push this further. > >>> One danger when doing this, is that knfsd may end up caching file >>> descriptors for files that have been unlinked. In order to deal >>> with >>> this issue, we use fsnotify to monitor the files, and have hooks to >>> evict those descriptors from the file cache if the i_nlink value >>> goes to 0. >>> >>> Jeff Layton (12): >>> sunrpc: add a new cache_detail operation for when a cache is >>> flushed >>> locks: create a new notifier chain for lease attempts >>> nfsd: add a new struct file caching facility to nfsd >>> nfsd: hook up nfsd_write to the new nfsd_file cache >>> nfsd: hook up nfsd_read to the nfsd_file cache >>> nfsd: hook nfsd_commit up to the nfsd_file cache >>> nfsd: convert nfs4_file->fi_fds array to use nfsd_files >>> nfsd: convert fi_deleg_file and ls_file fields to nfsd_file >>> nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache >>> nfsd: have nfsd_test_lock use the nfsd_file cache >>> nfsd: rip out the raparms cache >>> nfsd: close cached files prior to a REMOVE or RENAME that would >>> replace target >>> >>> Trond Myklebust (4): >>> notify: export symbols for use by the knfsd file cache >>> vfs: Export flush_delayed_fput for use by knfsd. >>> nfsd: Fix up some unused variable warnings >>> nfsd: Fix the documentation for svcxdr_tmpalloc() >>> >>> fs/file_table.c | 1 + >>> fs/locks.c | 62 +++ >>> fs/nfsd/Kconfig | 1 + >>> fs/nfsd/Makefile | 3 +- >>> fs/nfsd/blocklayout.c | 3 +- >>> fs/nfsd/export.c | 13 + >>> fs/nfsd/filecache.c | 885 >>> +++++++++++++++++++++++++++++++ >>> fs/nfsd/filecache.h | 60 +++ >>> fs/nfsd/nfs4layouts.c | 12 +- >>> fs/nfsd/nfs4proc.c | 83 +-- >>> fs/nfsd/nfs4state.c | 183 ++++--- >>> fs/nfsd/nfs4xdr.c | 31 +- >>> fs/nfsd/nfssvc.c | 16 +- >>> fs/nfsd/state.h | 10 +- >>> fs/nfsd/trace.h | 140 +++++ >>> fs/nfsd/vfs.c | 295 ++++------- >>> fs/nfsd/vfs.h | 9 +- >>> fs/nfsd/xdr4.h | 19 +- >>> fs/notify/fsnotify.h | 2 - >>> fs/notify/group.c | 2 + >>> fs/notify/mark.c | 6 + >>> include/linux/fs.h | 5 + >>> include/linux/fsnotify_backend.h | 2 + >>> include/linux/sunrpc/cache.h | 1 + >>> net/sunrpc/cache.c | 3 + >>> 25 files changed, 1465 insertions(+), 382 deletions(-) >>> create mode 100644 fs/nfsd/filecache.c >>> create mode 100644 fs/nfsd/filecache.h >>> >>> -- >>> 2.21.0 >>> >> >> -- >> Chuck Lever >> >> >> > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@xxxxxxxxxxxxxxx -- Chuck Lever