Jens Axboe <axboe@xxxxxxxxx> writes: > On 3/4/21 5:23 AM, Stefan Metzmacher wrote: >> >> Hi Jens, >> >>> +static pid_t fork_thread(int (*fn)(void *), void *arg) >>> +{ >>> + unsigned long flags = CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD| >>> + CLONE_IO|SIGCHLD; >>> + struct kernel_clone_args args = { >>> + .flags = ((lower_32_bits(flags) | CLONE_VM | >>> + CLONE_UNTRACED) & ~CSIGNAL), >>> + .exit_signal = (lower_32_bits(flags) & CSIGNAL), >>> + .stack = (unsigned long)fn, >>> + .stack_size = (unsigned long)arg, >>> + }; >>> + >>> + return kernel_clone(&args); >>> +} >> >> Can you please explain why CLONE_SIGHAND is used here? > > We can't have CLONE_THREAD without CLONE_SIGHAND... The io-wq workers > don't really care about signals, we don't use them internally. > >> Will the userspace signal handlers executed from the kernel thread? > > No > >> Will SIGCHLD be posted to the userspace signal handlers in a userspace >> process? Will wait() from userspace see the exit of a thread? > > Currently actually it does, but I think that's just an oversight. As far > as I can tell, we want to add something like the below. Untested... I'll > give this a spin in a bit. How do you mean? Where do you see do_notify_parent being called? It should not happen in exit_notify, as the new threads should be neither ptraced nor the thread_group_leader. Nor should do_notify_parent be called from wait_task_zombie as PF_IO_WORKERS are not ptraceable. Nor should do_notify_parent be called reparent_leader as the PF_IO_WORKER is not the thread_group_leader. Non-leader threads always autoreap and their exit_state is either 0 or EXIT_DEAD. Which leaves calling do_notify_parent in release_task which is perfectly appropriate if the io_worker is the last thread in the thread_group. I can see modifying eligible_child so __WCLONE will not cause wait to show the kernel thread. I don't think wait_task_stopped or wait_task_continued will register on PF_IO_WORKER thread if it does not process signals but I just skimmed those two functions when I was looking. It definitely looks like it would be worth modifying do_signal_stop so that the PF_IO_WORKERs are not included. Or else modifying the PF_IO_WORKER threads to stop with the rest of the process in that case. Eric > diff --git a/kernel/signal.c b/kernel/signal.c > index ba4d1ef39a9e..e5db1d8f18e5 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -1912,6 +1912,10 @@ bool do_notify_parent(struct task_struct *tsk, int sig) > bool autoreap = false; > u64 utime, stime; > > + /* Don't notify a parent task if an io_uring worker exits */ > + if (tsk->flags & PF_IO_WORKER) > + return true; > + > BUG_ON(sig == -1); > > /* do_notify_parent_cldstop should have been called instead. */