> On Mar 30, 2022, at 12:19 PM, Jan Kara <jack@xxxxxxx> wrote: > > On Wed 30-03-22 15:38:38, Chuck Lever III wrote: >>> On Mar 30, 2022, at 11:03 AM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: >>> >>> On Wed, 2022-03-30 at 12:34 +0200, Jan Kara wrote: >>>> Hello, >>>> >>>> during our performance testing we have noticed that commit >>>> b6669305d35a >>>> ("nfsd: Reduce the number of calls to nfsd_file_gc()") has introduced >>>> a >>>> performance regression when a client does random buffered writes. The >>>> workload on NFS client is fio running 4 processed doing random >>>> buffered writes to 4 >>>> different files and the files are large enough to hit dirty limits >>>> and >>>> force writeback from the client. In particular the invocation is >>>> like: >>>> >>>> fio --direct=0 --ioengine=sync --thread --directory=/mnt/mnt1 -- >>>> invalidate=1 --group_reporting=1 --runtime=300 --fallocate=posix -- >>>> ramp_time=10 --name=RandomReads-128000-4k-4 --new_group -- >>>> rw=randwrite --size=4000m --numjobs=4 --bs=4k -- >>>> filename_format=FioWorkloads.\$jobnum --end_fsync=1 >>>> >>>> The reason why commit b6669305d35a regresses performance is the >>>> filemap_flush() call it adds into nfsd_file_put(). Before this commit >>>> writeback on the server happened from nfsd_commit() code resulting in >>>> rather long semisequential streams of 4k writes. After commit >>>> b6669305d35a >>>> all the writeback happens from filemap_flush() calls resulting in >>>> much >>>> longer average seek distance (IO from different files is more >>>> interleaved) >>>> and about 16-20% regression in the achieved writeback throughput when >>>> the >>>> backing store is rotational storage. >>>> >>>> I think the filemap_flush() from nfsd_file_put() is indeed rather >>>> aggressive and I think we'd be better off to just leave writeback to >>>> either >>>> nfsd_commit() or standard dirty page cleaning happening on the >>>> system. I >>>> assume the rationale for the filemap_flush() call was to make it more >>>> likely the file can be evicted during the garbage collection run? Was >>>> there >>>> any particular problem leading to addition of this call or was it >>>> just "it >>>> seemed like a good idea" thing? >>>> >>>> Thanks in advance for ideas. >>>> >>>> Honza >>> >>> It was mainly introduced to reduce the amount of work that >>> nfsd_file_free() needs to do. In particular when re-exporting NFS, the >>> call to filp_close() can be expensive because it synchronously flushes >>> out dirty pages. That again means that some of the calls to >>> nfsd_file_dispose_list() can end up being very expensive (particularly >>> the ones run by the garbage collector itself). >> >> The "no regressions" rule suggests that some kind of action needs >> to be taken. I don't have a sense of whether Jan's workload or NFS >> re-export is the more common use case, however. >> >> I can see that filp_close() on a file on an NFS mount could be >> costly if that file has dirty pages, due to the NFS client's >> "flush on close" semantic. >> >> Trond, are there alternatives to flushing in the nfsd_file_put() >> path? I'm willing to entertain bug fix patches rather than a >> mechanical revert of b6669305d35a. > > Yeah, I don't think we need to rush fixing this with a revert. Sorry I wasn't clear: I would prefer to apply a bug fix over sending a revert commit, and I do not have enough information yet to make that choice. Waiting a bit is OK with me. > Also because > just removing the filemap_flush() from nfsd_file_put() would keep other > benefits of that commit while fixing the regression AFAIU. But I think > making flushing less aggressive is desirable because as I wrote in my other > reply, currently it is preventing pagecache from accumulating enough dirty > data for a good IO pattern. I might even go so far as to say that a small write workload isn't especially good for solid state storage either. I know Trond is trying to address NFS re-export performance, but there appear to be some palpable effects outside of that narrow use case that need to be considered. Thus a server-side fix, rather than a fix in the NFS client used to do the re-export, seems appropriate to consider. -- Chuck Lever