> On Mar 31, 2022, at 9:09 AM, Jan Kara <jack@xxxxxxx> wrote: > > On Wed 30-03-22 22:02:39, Trond Myklebust wrote: >> On Wed, 2022-03-30 at 17:56 +0000, Chuck Lever III wrote: >>> >>> >>>> On Mar 30, 2022, at 12:19 PM, Jan Kara <jack@xxxxxxx> wrote: >>>> >>>> On Wed 30-03-22 15:38:38, Chuck Lever III wrote: >>>>>> On Mar 30, 2022, at 11:03 AM, Trond Myklebust >>>>>> <trondmy@xxxxxxxxxxxxxxx> wrote: >>>>>> >>>>>> On Wed, 2022-03-30 at 12:34 +0200, Jan Kara wrote: >>>>>>> Hello, >>>>>>> >>>>>>> during our performance testing we have noticed that commit >>>>>>> b6669305d35a >>>>>>> ("nfsd: Reduce the number of calls to nfsd_file_gc()") has >>>>>>> introduced >>>>>>> a >>>>>>> performance regression when a client does random buffered >>>>>>> writes. The >>>>>>> workload on NFS client is fio running 4 processed doing >>>>>>> random >>>>>>> buffered writes to 4 >>>>>>> different files and the files are large enough to hit dirty >>>>>>> limits >>>>>>> and >>>>>>> force writeback from the client. In particular the invocation >>>>>>> is >>>>>>> like: >>>>>>> >>>>>>> fio --direct=0 --ioengine=sync --thread --directory=/mnt/mnt1 >>>>>>> -- >>>>>>> invalidate=1 --group_reporting=1 --runtime=300 -- >>>>>>> fallocate=posix -- >>>>>>> ramp_time=10 --name=RandomReads-128000-4k-4 --new_group -- >>>>>>> rw=randwrite --size=4000m --numjobs=4 --bs=4k -- >>>>>>> filename_format=FioWorkloads.\$jobnum --end_fsync=1 >>>>>>> >>>>>>> The reason why commit b6669305d35a regresses performance is >>>>>>> the >>>>>>> filemap_flush() call it adds into nfsd_file_put(). Before >>>>>>> this commit >>>>>>> writeback on the server happened from nfsd_commit() code >>>>>>> resulting in >>>>>>> rather long semisequential streams of 4k writes. After commit >>>>>>> b6669305d35a >>>>>>> all the writeback happens from filemap_flush() calls >>>>>>> resulting in >>>>>>> much >>>>>>> longer average seek distance (IO from different files is more >>>>>>> interleaved) >>>>>>> and about 16-20% regression in the achieved writeback >>>>>>> throughput when >>>>>>> the >>>>>>> backing store is rotational storage. >>>>>>> >>>>>>> I think the filemap_flush() from nfsd_file_put() is indeed >>>>>>> rather >>>>>>> aggressive and I think we'd be better off to just leave >>>>>>> writeback to >>>>>>> either >>>>>>> nfsd_commit() or standard dirty page cleaning happening on >>>>>>> the >>>>>>> system. I >>>>>>> assume the rationale for the filemap_flush() call was to make >>>>>>> it more >>>>>>> likely the file can be evicted during the garbage collection >>>>>>> run? Was >>>>>>> there >>>>>>> any particular problem leading to addition of this call or >>>>>>> was it >>>>>>> just "it >>>>>>> seemed like a good idea" thing? >>>>>>> >>>>>>> Thanks in advance for ideas. >>>>>>> >>>>>>> >>>>>>> Honza >>>>>> >>>>>> It was mainly introduced to reduce the amount of work that >>>>>> nfsd_file_free() needs to do. In particular when re-exporting >>>>>> NFS, the >>>>>> call to filp_close() can be expensive because it synchronously >>>>>> flushes >>>>>> out dirty pages. That again means that some of the calls to >>>>>> nfsd_file_dispose_list() can end up being very expensive >>>>>> (particularly >>>>>> the ones run by the garbage collector itself). >>>>> >>>>> The "no regressions" rule suggests that some kind of action needs >>>>> to be taken. I don't have a sense of whether Jan's workload or >>>>> NFS >>>>> re-export is the more common use case, however. >>>>> >>>>> I can see that filp_close() on a file on an NFS mount could be >>>>> costly if that file has dirty pages, due to the NFS client's >>>>> "flush on close" semantic. >>>>> >>>>> Trond, are there alternatives to flushing in the nfsd_file_put() >>>>> path? I'm willing to entertain bug fix patches rather than a >>>>> mechanical revert of b6669305d35a. >>>> >>>> Yeah, I don't think we need to rush fixing this with a revert. >>> >>> Sorry I wasn't clear: I would prefer to apply a bug fix over >>> sending a revert commit, and I do not have enough information >>> yet to make that choice. Waiting a bit is OK with me. >>> >>> >>>> Also because >>>> just removing the filemap_flush() from nfsd_file_put() would keep >>>> other >>>> benefits of that commit while fixing the regression AFAIU. But I >>>> think >>>> making flushing less aggressive is desirable because as I wrote in >>>> my other >>>> reply, currently it is preventing pagecache from accumulating >>>> enough dirty >>>> data for a good IO pattern. >>> >>> I might even go so far as to say that a small write workload >>> isn't especially good for solid state storage either. >>> >>> I know Trond is trying to address NFS re-export performance, but >>> there appear to be some palpable effects outside of that narrow >>> use case that need to be considered. Thus a server-side fix, >>> rather than a fix in the NFS client used to do the re-export, >>> seems appropriate to consider. >> >> Turns out it is not just the NFS client that is the problem. It is >> rather that we need in general to be able to detect flush errors and >> either report them directly (through commit) or we need to change the >> boot verifier to force clients to resend the unstable writes. >> >> Hence, I think we're looking at something like this: > > Thanks for the fix! I've run the patch through some testing and your fix > indeed restores the good IO pattern and returns the performance back to > original levels. So feel free to add: > > Tested-by: Jan Kara <jack@xxxxxxx> Excellent. I'll queue this up in the NFSD tree for 5.17-rc. > > Honza > >> >> 8<-------------------------------------------------------------------- >> From c0c89267f303432c8f5e490ea9b075856e4be79d Mon Sep 17 00:00:00 2001 >> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> >> Date: Wed, 30 Mar 2022 16:55:38 -0400 >> Subject: [PATCH] nfsd: Fix a write performance regression >> >> The call to filemap_flush() in nfsd_file_put() is there to ensure that >> we clear out any writes belonging to a NFSv3 client relatively quickly >> and avoid situations where the file can't be evicted by the garbage >> collector. It also ensures that we detect write errors quickly. >> >> The problem is this causes a regression in performance for some >> workloads. >> >> So try to improve matters by deferring writeback until we're ready to >> close the file, and need to detect errors so that we can force the >> client to resend. >> >> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> >> --- >> fs/nfsd/filecache.c | 18 +++++++++++++++--- >> 1 file changed, 15 insertions(+), 3 deletions(-) >> >> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c >> index 8bc807c5fea4..9578a6317709 100644 >> --- a/fs/nfsd/filecache.c >> +++ b/fs/nfsd/filecache.c >> @@ -235,6 +235,13 @@ nfsd_file_check_write_error(struct nfsd_file *nf) >> return filemap_check_wb_err(file->f_mapping, READ_ONCE(file->f_wb_err)); >> } >> >> +static void >> +nfsd_file_flush(struct nfsd_file *nf) >> +{ >> + if (nf->nf_file && vfs_fsync(nf->nf_file, 1) != 0) >> + nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); >> +} >> + >> static void >> nfsd_file_do_unhash(struct nfsd_file *nf) >> { >> @@ -302,11 +309,14 @@ nfsd_file_put(struct nfsd_file *nf) >> return; >> } >> >> - filemap_flush(nf->nf_file->f_mapping); >> is_hashed = test_bit(NFSD_FILE_HASHED, &nf->nf_flags) != 0; >> - nfsd_file_put_noref(nf); >> - if (is_hashed) >> + if (!is_hashed) { >> + nfsd_file_flush(nf); >> + nfsd_file_put_noref(nf); >> + } else { >> + nfsd_file_put_noref(nf); >> nfsd_file_schedule_laundrette(); >> + } >> if (atomic_long_read(&nfsd_filecache_count) >= NFSD_FILE_LRU_LIMIT) >> nfsd_file_gc(); >> } >> @@ -327,6 +337,7 @@ nfsd_file_dispose_list(struct list_head *dispose) >> while(!list_empty(dispose)) { >> nf = list_first_entry(dispose, struct nfsd_file, nf_lru); >> list_del(&nf->nf_lru); >> + nfsd_file_flush(nf); >> nfsd_file_put_noref(nf); >> } >> } >> @@ -340,6 +351,7 @@ nfsd_file_dispose_list_sync(struct list_head *dispose) >> while(!list_empty(dispose)) { >> nf = list_first_entry(dispose, struct nfsd_file, nf_lru); >> list_del(&nf->nf_lru); >> + nfsd_file_flush(nf); >> if (!refcount_dec_and_test(&nf->nf_ref)) >> continue; >> if (nfsd_file_free(nf)) >> -- >> 2.35.1 >> >> >> >> -- >> Trond Myklebust >> Linux NFS client maintainer, Hammerspace >> trond.myklebust@xxxxxxxxxxxxxxx >> >> > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR -- Chuck Lever