On Mon, 06 Jan 2025, Chuck Lever wrote: > On 1/5/25 6:11 PM, NeilBrown wrote: > > Under a high NFSv3 load with lots of different file being accessed The > > list_lru of garbage-collectable files can become quite long. > > > > Asking lisT_lru_scan() to scan the whole list can result in a long > > period during which a spinlock is held and no scheduling is possible. > > This is impolite. > > > > So only ask list_lru_scan() to scan 1024 entries at a time, and repeat > > if necessary - calling cond_resched() each time. > > > > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > --- > > fs/nfsd/filecache.c | 17 ++++++++++++----- > > 1 file changed, 12 insertions(+), 5 deletions(-) > > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c > > index a1cdba42c4fa..e99a86798e86 100644 > > --- a/fs/nfsd/filecache.c > > +++ b/fs/nfsd/filecache.c > > @@ -543,11 +543,18 @@ nfsd_file_gc(void) > > { > > LIST_HEAD(dispose); > > unsigned long ret; > > - > > - ret = list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb, > > - &dispose, list_lru_count(&nfsd_file_lru)); > > - trace_nfsd_file_gc_removed(ret, list_lru_count(&nfsd_file_lru)); > > - nfsd_file_dispose_list_delayed(&dispose); > > + unsigned long cnt = list_lru_count(&nfsd_file_lru); > > + > > + while (cnt > 0) { > > Since @cnt is unsigned, it should be safe to use "while (cnt) {" "while (cnt > 0) {" is equally safe and for me it emphasises that it is a decreasing counter, not a Bool. But different people have different tastes. > > (I might use "total" and "remaining" here -- "cnt", when said aloud, > leaves me snickering). "remaining" would be better than "cnt". > > > > + unsigned long num_to_scan = min(cnt, 1024UL); > > I see long delays with fewer than 1024 items on the list. I might > drop this number by one or two orders of magnitude. And make it a > symbolic constant. In that case I seriously wonder if this is where the delays are coming from. nfsd_file_dispose_list_delayed() does take and drop a spinlock repeatedly (though it may not always be the same lock) and call svc_wake_up() repeatedly - although the head of the queue might already be woken. We could optimise that to detect runs with the same nn and only take the lock once, and only wake_up once. > > There's another naked integer (8) in nfsd_file_net_dispose() -- how does > that relate to this new cap? Should that also be a symbolic constant? I don't think they relate. The trade-off with "8" is: a bigger number might block an nfsd thread for longer, forcing serialising when the work can usefully be done in parallel. a smaller number might needlessly wake lots of threads to share out a tiny amount of work. The 1024 is simply about "don't hold a spinlock for too long". > > > > + ret = list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb, > > + &dispose, num_to_scan); > > + trace_nfsd_file_gc_removed(ret, list_lru_count(&nfsd_file_lru)); > > + nfsd_file_dispose_list_delayed(&dispose); > > I need to go back and review the function traces to see where the > delays add up -- to make sure rescheduling here, rather than at some > other point, is appropriate. It probably is, but my memory fails me > these days. I would like to see those function traces too. > > > > + cnt -= num_to_scan; > > + if (cnt) > > + cond_resched(); > > Another approach might be to poke the laundrette again and simply > exit here, but I don't feel strongly about that. That wouldn't work without storing the 'remaining' count somewhere. With the current design we need to scan the whole list every 2 seconds. Thanks, NeilBrown > > > > + } > > } > > > > static void > > > -- > Chuck Lever >