Re: filecache LRU performance regression

Wang Yugui <wangyugui@xxxxxxxxxxxx> · Mon, 30 May 2022 09:36:50 +0800

Hi,

> 
> > On May 28, 2022, at 10:32 PM, Wang Yugui <wangyugui@xxxxxxxxxxxx> wrote:
> > 
> > Hi,
> > 
> >>> On May 27, 2022, at 4:37 PM, Frank van der Linden <fllinden@xxxxxxxxxx> wrote:
> >>> 
> >>> On Fri, May 27, 2022 at 06:59:47PM +0000, Chuck Lever III wrote:
> >>>> 
> >>>> Hi Frank-
> >>>> 
> >>>> Bruce recently reminded me about this issue. Is there a bugzilla somewhere?
> >>>> Do you have a reproducer I can try?
> >>> 
> >>> Hi Chuck,
> >>> 
> >>> The easiest way to reproduce the issue is to run generic/531 over an
> >>> NFSv4 mount, using a system with a larger number of CPUs on the client
> >>> side (or just scaling the test up manually - it has a calculation based
> >>> on the number of CPUs).
> >>> 
> >>> The test will take a long time to finish. I initially described the
> >>> details here:
> >>> 
> >>> https://lore.kernel.org/linux-nfs/20200608192122.GA19171@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> >>> 
> >>> Since then, it was also reported here:
> >>> 
> >>> https://lore.kernel.org/all/20210531125948.2D37.409509F4@xxxxxxxxxxxx/T/#m8c3e4173696e17a9d5903d2a619550f352314d20
> >> 
> >> Thanks for the summary. So, there isn't a bugzilla tracking this
> >> issue? If not, please create one here:
> >> 
> >>  https://bugzilla.linux-nfs.org/
> >> 
> >> Then we don't have to keep asking for a repeat summary ;-)
> >> 
> >> 
> >>> I posted an experimental patch, but it's actually not quite correct
> >>> (although I think the idea behind it is makes sense):
> >>> 
> >>> https://lore.kernel.org/linux-nfs/20201020183718.14618-4-trondmy@xxxxxxxxxx/T/#m869aa427f125afee2af9a89d569c6b98e12e516f
> >> 
> >> A handful of random comments:
> >> 
> >> - nfsd_file_put() is now significantly different than it used
> >>  to be, so that part of the patch will have to be updated in
> >>  order to apply to v5.18+
> > 
> > When many open files(>NFSD_FILE_LRU_THRESHOLD),
> > nfsd_file_gc() will waste many CPU times.
> 
> Thanks for the suggestion. I agree that CPU and
> memory bandwidth are not being used effectively
> by the filecache garbage collector.
> 
> 
> > Can we serialize nfsd_file_gc() for v5.18+ as a first step?
> 
> If I understand Frank's problem statement correctly,
> garbage collection during an nfsd_file_put() under
> an NFSv4-only workload walks the length of the LRU
> list and finds nothing to evict 100% of the time.
> That seems like a bug, and fixing it might give us
> the most improvement in this area.
> 
> 
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index 3d944ca..6abefd9 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -63,6 +63,8 @@ static struct delayed_work		nfsd_filecache_laundrette;
> > 
> > static void nfsd_file_gc(void);
> > 
> > +static atomic_t nfsd_file_gc_queue_delayed = ATOMIC_INIT(0);
> > +
> > static void
> > nfsd_file_schedule_laundrette(void)
> > {
> > @@ -71,8 +73,10 @@ nfsd_file_schedule_laundrette(void)
> > 	if (count == 0 || test_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags))
> > 		return;
> > 
> > -	queue_delayed_work(system_wq, &nfsd_filecache_laundrette,
> > +	if(atomic_cmpxchg(&nfsd_file_gc_queue_delayed, 0, 1)==0){
> > +		queue_delayed_work(system_wq, &nfsd_filecache_laundrette,
> > 			NFSD_LAUNDRETTE_DELAY);
> 
> I might misunderstand what this is doing exactly.
> I'm sure there's a preferred idiom in the kernel
> for not queuing a new work item when one is already
> running, so that an open-coded cmpxchg is not
> necessary.

Thanks. I dropped queue_delayed_work related change and reposted it.

> It might be better to allocate a specific workqueue
> for filecache garbage collection, and limit the
> maximum number of active work items allowed on that
> queue to 1. One benefit of that might be reducing
> the number of threads contending for the filecache
> data structures.

In this way, the number of threads of filecache garbage collection is
reduced. but filecache is still accessed by all nfsd threads for filecache
fetch/save.

Best Regards
Wang Yugui (wangyugui@xxxxxxxxxxxx)
2022/05/30

> If GC is capped like this, maybe create one GC
> workqueue per nfsd_net so GC activity in one
> namespace does not starve filecache GC in other
> namespaces.
> 
> Before I would take patches like this, though,
> performance data demonstrating a problem and some
> improvement should be presented separately or as
> part of the patch descriptions.

> If you repost, start a separate thread and cc:
> 
> M:      Tejun Heo <tj@xxxxxxxxxx>
> R:      Lai Jiangshan <jiangshanlai@xxxxxxxxx>
> 
> to get review by workqueue domain experts.
> 
>