> On Sep 2, 2022, at 12:44 PM, Dai Ngo <dai.ngo@xxxxxxxxxx> wrote: > > On 9/1/22 9:32 PM, Chuck Lever III wrote: >> >>> On Sep 1, 2022, at 9:56 PM, Dai Ngo <dai.ngo@xxxxxxxxxx> wrote: >>> >>> Hi Chuck, >>> >>> On 8/31/22 7:30 AM, Chuck Lever III wrote: >>>>> struct list_head *pos, *next; >>>>> struct nfs4_client *clp; >>>>> >>>>> - maxreap = (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) ? >>>>> - NFSD_CLIENT_MAX_TRIM_PER_RUN : 0; >>>>> + cb_cnt = atomic_read(&nn->nfsd_client_shrinker_cb_count); >>>>> + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients || >>>>> + cb_cnt) { >>>>> + maxreap = NFSD_CLIENT_MAX_TRIM_PER_RUN; >>>>> + atomic_set(&nn->nfsd_client_shrinker_cb_count, 0); >>>>> + } >>>> I'm not terribly happy with this, but I don't have a better suggestion >>>> at the moment. Let me think about it. >>> Do you have any suggestion to improve this, I want to incorporate it >>> before sending out v5? >> Let's consider some broad outlines... >> >> With regards to parametrizing the reaplist behavior, you want >> a normal laundromat run to reap zero or more courtesy clients. >> You want a shrinker laundromat run to reap more than zero. I >> think you want a minreap variable as well as a maxreap variable >> in there to control how the reaplist is built. Making @minreap >> a function parameter rather than a global atomic would be a >> plus for me, but maybe that's not practical. > > I'm not quite understand how the minreap is used, I think it > will make the code more complex. > >> >> But I would prefer a more straightforward approach overall. The >> proposed approach seems tricky and brittle, and needs a lot of >> explaining to understand. Other subsystems seem to get away with >> something simpler. >> >> Can nfsd_courtesy_client_count() simply reduce >> nn->nfs4_max_clients, kick the laundromat, and then return 0? >> Then get rid of nfsd_courtesy_client_scan(). > > I need to think more about this approach. However at first glance, > nn->nfs4_max_clients is used to control how many clients, including > active and courtesy clients, are allowed in the system. If we lower > this count, it also prevent new clients from connecting to the > system. So now the shrinker mechanism does more than just getting > rid of unused resources, maybe that's ok? I don't want to go down the path of kicking out active clients if we don't need to. Maybe you can make the laundromat code more careful to make a distinction between active and courtesy clients. It might sense in the long run to separate the laundromat's processing of courtesy and active clients anyway. >> Or, nfsd_courtesy_client_count() could return >> nfsd_couresy_client_count. nfsd_courtesy_client_scan() could >> then look something like this: >> >> if ((sc->gfp_mask & GFP_KERNEL) != GFP_KERNEL) >> return SHRINK_STOP; >> >> nfsd_get_client_reaplist(nn, reaplist, lt); >> list_for_each_safe(pos, next, &reaplist) { >> clp = list_entry(pos, struct nfs4_client, cl_lru); >> trace_nfsd_clid_purged(&clp->cl_clientid); >> list_del_init(&clp->cl_lru); >> expire_client(clp); >> count++; >> } >> return count; > > This does not work, we cannot expire clients on the context of > scan callback due to deadlock problem. Correct, we don't want to start shrinker laundromat activity if the allocation request specified that it cannot wait. But are you sure it doesn't work if sc_gfp_flags is set to GFP_KERNEL? > I will experiment with ways to get rid of the scan function to > make the logic simpler. -- Chuck Lever