On Tue, Nov 28, 2023 at 02:52:59PM +0100, Oleg Nesterov wrote: > On 11/28, Christian Brauner wrote: > > > > Should be simpler if you invert the logic? > > > > COMPLETELY UNTESTED > > Agreed, this looks much better to me. But perhaps we can just add the new > PF_KTHREAD_XXX flag and change fput > > > --- a/fs/file_table.c > +++ b/fs/file_table.c > @@ -445,7 +445,8 @@ void fput(struct file *file) > if (atomic_long_dec_and_test(&file->f_count)) { > struct task_struct *task = current; > > - if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) { > + if (likely(!in_interrupt() && > + task->flags & (PF_KTHREAD|PF_KTHREAD_XXX) != PF_KTHREAD) { > init_task_work(&file->f_rcuhead, ____fput); > if (!task_work_add(task, &file->f_rcuhead, TWA_RESUME)) > return; > > ? > > Then nfsd() can simply set PF_KTHREAD_XXX. This looks even simpler to me. Yeah, I had played with that as well. Only reason I didn't do it was to avoid a PF_* flag. If that's preferable it might be worth to just add PF_TASK_WORK and decouple this from PF_KTHREAD. kthread creation and userspace process creation are all based on the same struct kernel_clone_args for a while now ever since we added this for clone3() so we catch everything in copy_process(): diff --git a/fs/file_table.c b/fs/file_table.c index 6deac386486d..5d3eb5ef4fc7 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -437,7 +437,7 @@ void fput(struct file *file) file_free(file); return; } - if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) { + if (likely(!in_interrupt() && (task->flags & PF_TASK_WORK))) { init_task_work(&file->f_rcuhead, ____fput); if (!task_work_add(task, &file->f_rcuhead, TWA_RESUME)) return; diff --git a/include/linux/sched.h b/include/linux/sched.h index 292c31697248..8dfc06acc6a0 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1755,7 +1755,7 @@ extern struct pid *cad_pid; * I am cleaning dirty pages from some other bdi. */ #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ -#define PF__HOLE__00800000 0x00800000 +#define PF_TASK_WORK 0x00800000 #define PF__HOLE__01000000 0x01000000 #define PF__HOLE__02000000 0x02000000 #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ diff --git a/kernel/fork.c b/kernel/fork.c index 10917c3e1f03..2604235c800f 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2346,6 +2346,14 @@ __latent_entropy struct task_struct *copy_process( if (args->io_thread) p->flags |= PF_IO_WORKER; + /* + * By default only non-kernel threads can use task work. Kernel + * threads that manage task work explicitly can add that flag in + * their kthread callback. + */ + if (!args->kthread) + p->flags |= PF_TASK_WORK; + if (args->name) strscpy_pad(p->comm, args->name, sizeof(p->comm));