Re: Performance regression with random IO pattern from the client

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Wed, 30 Mar 2022 17:56:36 +0000

> On Mar 30, 2022, at 12:19 PM, Jan Kara <jack@xxxxxxx> wrote:
> 
> On Wed 30-03-22 15:38:38, Chuck Lever III wrote:
>>> On Mar 30, 2022, at 11:03 AM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
>>> 
>>> On Wed, 2022-03-30 at 12:34 +0200, Jan Kara wrote:
>>>> Hello,
>>>> 
>>>> during our performance testing we have noticed that commit
>>>> b6669305d35a
>>>> ("nfsd: Reduce the number of calls to nfsd_file_gc()") has introduced
>>>> a
>>>> performance regression when a client does random buffered writes. The
>>>> workload on NFS client is fio running 4 processed doing random
>>>> buffered writes to 4
>>>> different files and the files are large enough to hit dirty limits
>>>> and
>>>> force writeback from the client. In particular the invocation is
>>>> like:
>>>> 
>>>> fio --direct=0 --ioengine=sync --thread --directory=/mnt/mnt1 --
>>>> invalidate=1 --group_reporting=1 --runtime=300 --fallocate=posix --
>>>> ramp_time=10 --name=RandomReads-128000-4k-4 --new_group --
>>>> rw=randwrite --size=4000m --numjobs=4 --bs=4k --
>>>> filename_format=FioWorkloads.\$jobnum --end_fsync=1
>>>> 
>>>> The reason why commit b6669305d35a regresses performance is the
>>>> filemap_flush() call it adds into nfsd_file_put(). Before this commit
>>>> writeback on the server happened from nfsd_commit() code resulting in
>>>> rather long semisequential streams of 4k writes. After commit
>>>> b6669305d35a
>>>> all the writeback happens from filemap_flush() calls resulting in
>>>> much
>>>> longer average seek distance (IO from different files is more
>>>> interleaved)
>>>> and about 16-20% regression in the achieved writeback throughput when
>>>> the
>>>> backing store is rotational storage.
>>>> 
>>>> I think the filemap_flush() from nfsd_file_put() is indeed rather
>>>> aggressive and I think we'd be better off to just leave writeback to
>>>> either
>>>> nfsd_commit() or standard dirty page cleaning happening on the
>>>> system. I
>>>> assume the rationale for the filemap_flush() call was to make it more
>>>> likely the file can be evicted during the garbage collection run? Was
>>>> there
>>>> any particular problem leading to addition of this call or was it
>>>> just "it
>>>> seemed like a good idea" thing?
>>>> 
>>>> Thanks in advance for ideas.
>>>> 
>>>>                                                                Honza
>>> 
>>> It was mainly introduced to reduce the amount of work that
>>> nfsd_file_free() needs to do. In particular when re-exporting NFS, the
>>> call to filp_close() can be expensive because it synchronously flushes
>>> out dirty pages. That again means that some of the calls to
>>> nfsd_file_dispose_list() can end up being very expensive (particularly
>>> the ones run by the garbage collector itself).
>> 
>> The "no regressions" rule suggests that some kind of action needs
>> to be taken. I don't have a sense of whether Jan's workload or NFS
>> re-export is the more common use case, however.
>> 
>> I can see that filp_close() on a file on an NFS mount could be
>> costly if that file has dirty pages, due to the NFS client's
>> "flush on close" semantic.
>> 
>> Trond, are there alternatives to flushing in the nfsd_file_put()
>> path? I'm willing to entertain bug fix patches rather than a
>> mechanical revert of b6669305d35a.
> 
> Yeah, I don't think we need to rush fixing this with a revert.

Sorry I wasn't clear: I would prefer to apply a bug fix over
sending a revert commit, and I do not have enough information
yet to make that choice. Waiting a bit is OK with me.

> Also because
> just removing the filemap_flush() from nfsd_file_put() would keep other
> benefits of that commit while fixing the regression AFAIU. But I think
> making flushing less aggressive is desirable because as I wrote in my other
> reply, currently it is preventing pagecache from accumulating enough dirty
> data for a good IO pattern.

I might even go so far as to say that a small write workload
isn't especially good for solid state storage either.

I know Trond is trying to address NFS re-export performance, but
there appear to be some palpable effects outside of that narrow
use case that need to be considered. Thus a server-side fix,
rather than a fix in the NFS client used to do the re-export,
seems appropriate to consider.

--
Chuck Lever