On Tue, 2023-11-28 at 10:34 -0500, Chuck Lever wrote: > On Tue, Nov 28, 2023 at 01:57:30PM +1100, NeilBrown wrote: > > > > (trimmed cc...) > > > > On Tue, 28 Nov 2023, Chuck Lever wrote: > > > On Tue, Nov 28, 2023 at 11:16:06AM +1100, NeilBrown wrote: > > > > On Tue, 28 Nov 2023, Chuck Lever wrote: > > > > > On Tue, Nov 28, 2023 at 09:05:21AM +1100, NeilBrown wrote: > > > > > > > > > > > > I have evidence from a customer site of 256 nfsd threads adding files to > > > > > > delayed_fput_lists nearly twice as fast they are retired by a single > > > > > > work-queue thread running delayed_fput(). As you might imagine this > > > > > > does not end well (20 million files in the queue at the time a snapshot > > > > > > was taken for analysis). > > > > > > > > > > > > While this might point to a problem with the filesystem not handling the > > > > > > final close efficiently, such problems should only hurt throughput, not > > > > > > lead to memory exhaustion. > > > > > > > > > > I have this patch queued for v6.8: > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/commit/?h=nfsd-next&id=c42661ffa58acfeaf73b932dec1e6f04ce8a98c0 > > > > > > > > > > > > > Thanks.... > > > > I think that change is good, but I don't think it addresses the problem > > > > mentioned in the description, and it is not directly relevant to the > > > > problem I saw ... though it is complicated. > > > > > > > > The problem "workqueue ... hogged cpu..." probably means that > > > > nfsd_file_dispose_list() needs a cond_resched() call in the loop. > > > > That will stop it from hogging the CPU whether it is tied to one CPU or > > > > free to roam. > > > > > > > > Also that work is calling filp_close() which primarily calls > > > > filp_flush(). > > > > It also calls fput() but that does minimal work. If there is much work > > > > to do then that is offloaded to another work-item. *That* is the > > > > workitem that I had problems with. > > > > > > > > The problem I saw was with an older kernel which didn't have the nfsd > > > > file cache and so probably is calling filp_close more often. > > > > > > Without the file cache, the filp_close() should be handled directly > > > by the nfsd thread handling the RPC, IIRC. > > > > Yes - but __fput() is handled by a workqueue. > > > > > > > > > > > > So maybe > > > > my patch isn't so important now. Particularly as nfsd now isn't closing > > > > most files in-task but instead offloads that to another task. So the > > > > final fput will not be handled by the nfsd task either. > > > > > > > > But I think there is room for improvement. Gathering lots of files > > > > together into a list and closing them sequentially is not going to be as > > > > efficient as closing them in parallel. > > > > > > I believe the file cache passes the filps to the work queue one at > > > > nfsd_file_close_inode() does. nfsd_file_gc() and nfsd_file_lru_scan() > > can pass multiple. > > > > > a time, but I don't think there's anything that forces the work > > > queue to handle each flush/close completely before proceeding to the > > > next. > > > > Parallelism with workqueues is controlled by the work items (struct > > work_struct). Two different work items can run in parallel. But any > > given work item can never run parallel to itself. > > > > The only work items queued on nfsd_filecache_wq are from > > nn->fcache_disposal->work. > > There is one of these for each network namespace. So in any given > > network namespace, all work on nfsd_filecache_wq is fully serialised. > > OIC, it's that specific case you are concerned with. The per- > namespace laundrette was added by: > > 9542e6a643fc ("nfsd: Containerise filecache laundrette") > > It's purpose was to confine the close backlog to each container. > > Seems like it would be better if there was a struct work_struct > in each struct nfsd_file. That wouldn't add real backpressure to > nfsd threads, but it would enable file closes to run in parallel. > I like this idea. That seems a lot simpler than all of this weirdo queueing of delayed closes that we do. -- Jeff Layton <jlayton@xxxxxxxxxx>