On Mon, 04 Dec 2023, Al Viro wrote: > On Mon, Dec 04, 2023 at 12:36:41PM +1100, NeilBrown wrote: > > > This means that any cost for doing the work is not imposed on the kernel > > thread, and importantly excessive amounts of work cannot apply > > back-pressure to reduce the amount of new work queued. > > It also means that a stuck ->release() won't end up with stuck > kernel thread... Is a stuck kernel thread any worse than a stuck user-space thread? > > > earlier than would be ideal. When __dput (from the workqueue) calls > > WTF is that __dput thing? __fput, perhaps? Either __fput or dput :-) ->release isn't the problem that I am seeing. The call trace that I see causing problems is __fput -> dput -> dentry_kill -> destroy_inode -> xfs_fs_destroy_inode so both __fput and dput are there, but most of the code is dput related. So both "put"s were swimming in by brain and the wrong combination came out. I changed it to __fput - thanks. > > > This patch adds a new process flag PF_RUNS_TASK_WORK which is now used > > instead of PF_KTHREAD to determine whether it is sensible to queue > > something to task_works. This flag is always set for non-kernel threads. > > *ugh* > > What's that flag for? task_work_add() always can fail; any caller must > have a fallback to cope with that possibility; fput() certainly does. As Oleg pointed out, all threads including kernel threads call task_work_run() at exit, and some kernel threads depend on this. So disabling task_work_add() early for all kernel threads would break things. Currently task_work_add() fails only once the process has started exiting. Only code that might run during the exit handling need check. > > Just have the kernel threads born with ->task_works set to &work_exited > and provide a primitive that would flip it from that to NULL. > > > @@ -1328,7 +1328,7 @@ static void mntput_no_expire(struct mount *mnt) > > > > if (likely(!(mnt->mnt.mnt_flags & MNT_INTERNAL))) { > > struct task_struct *task = current; > > - if (likely(!(task->flags & PF_KTHREAD))) { > > + if (likely((task->flags & PF_RUNS_TASK_WORK))) { > > init_task_work(&mnt->mnt_rcu, __cleanup_mnt); > > if (!task_work_add(task, &mnt->mnt_rcu, TWA_RESUME)) > > return; > > Now, *that* is something I have much stronger objections to. > Stuck filesystem shutdown is far more likely than stuck > ->release(). You are seriously asking for trouble here. > > Why would you want to have nfsd block on that? > I don't *want* nfsd block on that, but I don't care if it does. nfsd will only call task_work_run() at a safe time. This is no different to user-space processes only calling task_work_run() at a safe time. The new flag isn't "I_AM_NFSD" or "QUEUE_FPUT_WORK_TO_TASK". It is "RUNS_TASK_WORK". So any code that would prefer to call task_work_add() but has a fall-back for tasks that don't call run_task_work() should test the new flag. Doing otherwise would be inconsistent and potentially confusing. I don't think that nfsd getting stuck would be any worse than systemd getting stuck, or automount getting stuck, or udiskd getting stuck. Thanks, NeilBrown