Re: Performance regression with random IO pattern from the client

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Wed, 30 Mar 2022 15:38:38 +0000

> On Mar 30, 2022, at 11:03 AM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Wed, 2022-03-30 at 12:34 +0200, Jan Kara wrote:
>> Hello,
>> 
>> during our performance testing we have noticed that commit
>> b6669305d35a
>> ("nfsd: Reduce the number of calls to nfsd_file_gc()") has introduced
>> a
>> performance regression when a client does random buffered writes. The
>> workload on NFS client is fio running 4 processed doing random
>> buffered writes to 4
>> different files and the files are large enough to hit dirty limits
>> and
>> force writeback from the client. In particular the invocation is
>> like:
>> 
>> fio --direct=0 --ioengine=sync --thread --directory=/mnt/mnt1 --
>> invalidate=1 --group_reporting=1 --runtime=300 --fallocate=posix --
>> ramp_time=10 --name=RandomReads-128000-4k-4 --new_group --
>> rw=randwrite --size=4000m --numjobs=4 --bs=4k --
>> filename_format=FioWorkloads.\$jobnum --end_fsync=1
>> 
>> The reason why commit b6669305d35a regresses performance is the
>> filemap_flush() call it adds into nfsd_file_put(). Before this commit
>> writeback on the server happened from nfsd_commit() code resulting in
>> rather long semisequential streams of 4k writes. After commit
>> b6669305d35a
>> all the writeback happens from filemap_flush() calls resulting in
>> much
>> longer average seek distance (IO from different files is more
>> interleaved)
>> and about 16-20% regression in the achieved writeback throughput when
>> the
>> backing store is rotational storage.
>> 
>> I think the filemap_flush() from nfsd_file_put() is indeed rather
>> aggressive and I think we'd be better off to just leave writeback to
>> either
>> nfsd_commit() or standard dirty page cleaning happening on the
>> system. I
>> assume the rationale for the filemap_flush() call was to make it more
>> likely the file can be evicted during the garbage collection run? Was
>> there
>> any particular problem leading to addition of this call or was it
>> just "it
>> seemed like a good idea" thing?
>> 
>> Thanks in advance for ideas.
>> 
>>                                                                 Honza
> 
> It was mainly introduced to reduce the amount of work that
> nfsd_file_free() needs to do. In particular when re-exporting NFS, the
> call to filp_close() can be expensive because it synchronously flushes
> out dirty pages. That again means that some of the calls to
> nfsd_file_dispose_list() can end up being very expensive (particularly
> the ones run by the garbage collector itself).

The "no regressions" rule suggests that some kind of action needs
to be taken. I don't have a sense of whether Jan's workload or NFS
re-export is the more common use case, however.

I can see that filp_close() on a file on an NFS mount could be
costly if that file has dirty pages, due to the NFS client's
"flush on close" semantic.

Trond, are there alternatives to flushing in the nfsd_file_put()
path? I'm willing to entertain bug fix patches rather than a
mechanical revert of b6669305d35a.

--
Chuck Lever