Re: filecache LRU performance regression

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Sun, 29 May 2022 16:58:09 +0000

> On May 28, 2022, at 10:32 PM, Wang Yugui <wangyugui@xxxxxxxxxxxx> wrote:
> 
> Hi,
> 
>>> On May 27, 2022, at 4:37 PM, Frank van der Linden <fllinden@xxxxxxxxxx> wrote:
>>> 
>>> On Fri, May 27, 2022 at 06:59:47PM +0000, Chuck Lever III wrote:
>>>> 
>>>> Hi Frank-
>>>> 
>>>> Bruce recently reminded me about this issue. Is there a bugzilla somewhere?
>>>> Do you have a reproducer I can try?
>>> 
>>> Hi Chuck,
>>> 
>>> The easiest way to reproduce the issue is to run generic/531 over an
>>> NFSv4 mount, using a system with a larger number of CPUs on the client
>>> side (or just scaling the test up manually - it has a calculation based
>>> on the number of CPUs).
>>> 
>>> The test will take a long time to finish. I initially described the
>>> details here:
>>> 
>>> https://lore.kernel.org/linux-nfs/20200608192122.GA19171@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
>>> 
>>> Since then, it was also reported here:
>>> 
>>> https://lore.kernel.org/all/20210531125948.2D37.409509F4@xxxxxxxxxxxx/T/#m8c3e4173696e17a9d5903d2a619550f352314d20
>> 
>> Thanks for the summary. So, there isn't a bugzilla tracking this
>> issue? If not, please create one here:
>> 
>>  https://bugzilla.linux-nfs.org/
>> 
>> Then we don't have to keep asking for a repeat summary ;-)
>> 
>> 
>>> I posted an experimental patch, but it's actually not quite correct
>>> (although I think the idea behind it is makes sense):
>>> 
>>> https://lore.kernel.org/linux-nfs/20201020183718.14618-4-trondmy@xxxxxxxxxx/T/#m869aa427f125afee2af9a89d569c6b98e12e516f
>> 
>> A handful of random comments:
>> 
>> - nfsd_file_put() is now significantly different than it used
>>  to be, so that part of the patch will have to be updated in
>>  order to apply to v5.18+
> 
> When many open files(>NFSD_FILE_LRU_THRESHOLD),
> nfsd_file_gc() will waste many CPU times.

Thanks for the suggestion. I agree that CPU and
memory bandwidth are not being used effectively
by the filecache garbage collector.

> Can we serialize nfsd_file_gc() for v5.18+ as a first step?

If I understand Frank's problem statement correctly,
garbage collection during an nfsd_file_put() under
an NFSv4-only workload walks the length of the LRU
list and finds nothing to evict 100% of the time.
That seems like a bug, and fixing it might give us
the most improvement in this area.

> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 3d944ca..6abefd9 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -63,6 +63,8 @@ static struct delayed_work		nfsd_filecache_laundrette;
> 
> static void nfsd_file_gc(void);
> 
> +static atomic_t nfsd_file_gc_queue_delayed = ATOMIC_INIT(0);
> +
> static void
> nfsd_file_schedule_laundrette(void)
> {
> @@ -71,8 +73,10 @@ nfsd_file_schedule_laundrette(void)
> 	if (count == 0 || test_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags))
> 		return;
> 
> -	queue_delayed_work(system_wq, &nfsd_filecache_laundrette,
> +	if(atomic_cmpxchg(&nfsd_file_gc_queue_delayed, 0, 1)==0){
> +		queue_delayed_work(system_wq, &nfsd_filecache_laundrette,
> 			NFSD_LAUNDRETTE_DELAY);

I might misunderstand what this is doing exactly.
I'm sure there's a preferred idiom in the kernel
for not queuing a new work item when one is already
running, so that an open-coded cmpxchg is not
necessary.

It might be better to allocate a specific workqueue
for filecache garbage collection, and limit the
maximum number of active work items allowed on that
queue to 1. One benefit of that might be reducing
the number of threads contending for the filecache
data structures.

If GC is capped like this, maybe create one GC
workqueue per nfsd_net so GC activity in one
namespace does not starve filecache GC in other
namespaces.

Before I would take patches like this, though,
performance data demonstrating a problem and some
improvement should be presented separately or as
part of the patch descriptions.

If you repost, start a separate thread and cc:

M:      Tejun Heo <tj@xxxxxxxxxx>
R:      Lai Jiangshan <jiangshanlai@xxxxxxxxx>

to get review by workqueue domain experts.

> +	}
> }
> 
> static void
> @@ -468,15 +472,22 @@ nfsd_file_lru_walk_list(struct shrink_control *sc)
> 	return ret;
> }
> 
> +// nfsd_file_gc() support concurrency, but serialize it
> +atomic_t nfsd_file_gc_running = ATOMIC_INIT(0);
> static void
> nfsd_file_gc(void)
> {
> -	nfsd_file_lru_walk_list(NULL);
> +	if(atomic_cmpxchg(&nfsd_file_gc_running, 0, 1)==0){
> +		nfsd_file_lru_walk_list(NULL);
> +		atomic_set(&nfsd_file_gc_running, 0);
> +	}
> }
> 
> static void
> nfsd_file_gc_worker(struct work_struct *work)
> {
> +	atomic_set(&nfsd_file_gc_queue_delayed, 0);
> +	// pr_info("nfsd_file_gc_worker\n");
> 	nfsd_file_gc();
> 	nfsd_file_schedule_laundrette();
> }
> 
> Best Regards
> Wang Yugui (wangyugui@xxxxxxxxxxxx)
> 2022/05/29

--
Chuck Lever