On Wed, 2022-03-30 at 15:38 +0000, Chuck Lever III wrote: > > > > On Mar 30, 2022, at 11:03 AM, Trond Myklebust > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > On Wed, 2022-03-30 at 12:34 +0200, Jan Kara wrote: > > > Hello, > > > > > > during our performance testing we have noticed that commit > > > b6669305d35a > > > ("nfsd: Reduce the number of calls to nfsd_file_gc()") has > > > introduced > > > a > > > performance regression when a client does random buffered writes. > > > The > > > workload on NFS client is fio running 4 processed doing random > > > buffered writes to 4 > > > different files and the files are large enough to hit dirty > > > limits > > > and > > > force writeback from the client. In particular the invocation is > > > like: > > > > > > fio --direct=0 --ioengine=sync --thread --directory=/mnt/mnt1 -- > > > invalidate=1 --group_reporting=1 --runtime=300 --fallocate=posix > > > -- > > > ramp_time=10 --name=RandomReads-128000-4k-4 --new_group -- > > > rw=randwrite --size=4000m --numjobs=4 --bs=4k -- > > > filename_format=FioWorkloads.\$jobnum --end_fsync=1 > > > > > > The reason why commit b6669305d35a regresses performance is the > > > filemap_flush() call it adds into nfsd_file_put(). Before this > > > commit > > > writeback on the server happened from nfsd_commit() code > > > resulting in > > > rather long semisequential streams of 4k writes. After commit > > > b6669305d35a > > > all the writeback happens from filemap_flush() calls resulting in > > > much > > > longer average seek distance (IO from different files is more > > > interleaved) > > > and about 16-20% regression in the achieved writeback throughput > > > when > > > the > > > backing store is rotational storage. > > > > > > I think the filemap_flush() from nfsd_file_put() is indeed rather > > > aggressive and I think we'd be better off to just leave writeback > > > to > > > either > > > nfsd_commit() or standard dirty page cleaning happening on the > > > system. I > > > assume the rationale for the filemap_flush() call was to make it > > > more > > > likely the file can be evicted during the garbage collection run? > > > Was > > > there > > > any particular problem leading to addition of this call or was it > > > just "it > > > seemed like a good idea" thing? > > > > > > Thanks in advance for ideas. > > > > > > > > > Honza > > > > It was mainly introduced to reduce the amount of work that > > nfsd_file_free() needs to do. In particular when re-exporting NFS, > > the > > call to filp_close() can be expensive because it synchronously > > flushes > > out dirty pages. That again means that some of the calls to > > nfsd_file_dispose_list() can end up being very expensive > > (particularly > > the ones run by the garbage collector itself). > > The "no regressions" rule suggests that some kind of action needs > to be taken. I don't have a sense of whether Jan's workload or NFS > re-export is the more common use case, however. > > I can see that filp_close() on a file on an NFS mount could be > costly if that file has dirty pages, due to the NFS client's > "flush on close" semantic. > > Trond, are there alternatives to flushing in the nfsd_file_put() > path? I'm willing to entertain bug fix patches rather than a > mechanical revert of b6669305d35a. Hang on a bit. This commit went into Linux 5.6 more than two years ago, and this is the only report so far of a regression. We can afford the time to give it some thought, particularly given the fact that the laundrette currently runs on the system workqueue and so really isn't a good choice for write back. One option might be to simply exit early in nfs_file_flush() if the caller is a workqueue thread. (i.e. enforce close-to-open only on userspace processes). -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx