On Wed 30-03-22 15:38:38, Chuck Lever III wrote: > > On Mar 30, 2022, at 11:03 AM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > On Wed, 2022-03-30 at 12:34 +0200, Jan Kara wrote: > >> Hello, > >> > >> during our performance testing we have noticed that commit > >> b6669305d35a > >> ("nfsd: Reduce the number of calls to nfsd_file_gc()") has introduced > >> a > >> performance regression when a client does random buffered writes. The > >> workload on NFS client is fio running 4 processed doing random > >> buffered writes to 4 > >> different files and the files are large enough to hit dirty limits > >> and > >> force writeback from the client. In particular the invocation is > >> like: > >> > >> fio --direct=0 --ioengine=sync --thread --directory=/mnt/mnt1 -- > >> invalidate=1 --group_reporting=1 --runtime=300 --fallocate=posix -- > >> ramp_time=10 --name=RandomReads-128000-4k-4 --new_group -- > >> rw=randwrite --size=4000m --numjobs=4 --bs=4k -- > >> filename_format=FioWorkloads.\$jobnum --end_fsync=1 > >> > >> The reason why commit b6669305d35a regresses performance is the > >> filemap_flush() call it adds into nfsd_file_put(). Before this commit > >> writeback on the server happened from nfsd_commit() code resulting in > >> rather long semisequential streams of 4k writes. After commit > >> b6669305d35a > >> all the writeback happens from filemap_flush() calls resulting in > >> much > >> longer average seek distance (IO from different files is more > >> interleaved) > >> and about 16-20% regression in the achieved writeback throughput when > >> the > >> backing store is rotational storage. > >> > >> I think the filemap_flush() from nfsd_file_put() is indeed rather > >> aggressive and I think we'd be better off to just leave writeback to > >> either > >> nfsd_commit() or standard dirty page cleaning happening on the > >> system. I > >> assume the rationale for the filemap_flush() call was to make it more > >> likely the file can be evicted during the garbage collection run? Was > >> there > >> any particular problem leading to addition of this call or was it > >> just "it > >> seemed like a good idea" thing? > >> > >> Thanks in advance for ideas. > >> > >> Honza > > > > It was mainly introduced to reduce the amount of work that > > nfsd_file_free() needs to do. In particular when re-exporting NFS, the > > call to filp_close() can be expensive because it synchronously flushes > > out dirty pages. That again means that some of the calls to > > nfsd_file_dispose_list() can end up being very expensive (particularly > > the ones run by the garbage collector itself). > > The "no regressions" rule suggests that some kind of action needs > to be taken. I don't have a sense of whether Jan's workload or NFS > re-export is the more common use case, however. > > I can see that filp_close() on a file on an NFS mount could be > costly if that file has dirty pages, due to the NFS client's > "flush on close" semantic. > > Trond, are there alternatives to flushing in the nfsd_file_put() > path? I'm willing to entertain bug fix patches rather than a > mechanical revert of b6669305d35a. Yeah, I don't think we need to rush fixing this with a revert. Also because just removing the filemap_flush() from nfsd_file_put() would keep other benefits of that commit while fixing the regression AFAIU. But I think making flushing less aggressive is desirable because as I wrote in my other reply, currently it is preventing pagecache from accumulating enough dirty data for a good IO pattern. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR