On Tue, 05 Dec 2023, Dave Chinner wrote: > On Mon, Dec 04, 2023 at 12:36:41PM +1100, NeilBrown wrote: > > User-space processes always call task_work_run() as needed when > > returning from a system call. Kernel-threads generally do not. > > Because of this some work that is best run in the task_works context > > (guaranteed that no locks are held) cannot be queued to task_works from > > kernel threads and so are queued to a (single) work_time to be managed > > on a work queue. > > > > This means that any cost for doing the work is not imposed on the kernel > > thread, and importantly excessive amounts of work cannot apply > > back-pressure to reduce the amount of new work queued. > > > > I have evidence from a customer site when nfsd (which runs as kernel > > threads) is being asked to modify many millions of files which causes > > sufficient memory pressure that some cache (in XFS I think) gets cleaned > > earlier than would be ideal. When __dput (from the workqueue) calls > > __dentry_kill, xfs_fs_destroy_inode() needs to synchronously read back > > previously cached info from storage. > > We fixed that specific XFS problem in 5.9. > > https://lore.kernel.org/linux-xfs/20200622081605.1818434-1-david@xxxxxxxxxxxxx/ Good to know - thanks. > > Can you reproduce these issues on a current TOT kernel? I haven't tried. I don't know if I know enough details of the work load to attempt it. > > If not, there's no bugs to fix in the upstream kernel. If you can, > then we've got more XFS issues to work through and fix. > > Fundamentally, though, we should not be papering over an XFS issue > by changing how core task_work infrastructure is used. So let's deal > with the XFS issue first.... I disagree. This customer experience has demonstrated both a bug in XFS and bug in the interaction between fput, task_work, and nfsd. If a bug in a filesystem that only causes a modest performance impact when used through the syscall API can bring the system to its knees through memory exhaustion when used by nfsd, then that is a robustness issue for nfsd. I want to fix that robustness issue so that unusual behaviour in filesystems does not cause out-of-proportion bad behaviour in nfsd. I highlighted this in the cover letter to the first version of my patch: https://lore.kernel.org/all/170112272125.7109.6245462722883333440@xxxxxxxxxxxxxxxxxxxxx/ While this might point to a problem with the filesystem not handling the final close efficiently, such problems should only hurt throughput, not lead to memory exhaustion. Thanks, NeilBrown > > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >