Re: [PATCH V6 01/10] Use copy_process in vhost layer

Mike Christie <michael.christie@xxxxxxxxxx> · Mon, 17 Jan 2022 10:41:05 -0600




On 12/22/21 12:24 PM, Eric W. Biederman wrote:
> All I am certain of is that you need to set
> "args->exit_signal = -1;".  This prevents having to play games with
> do_notify_parent.

Hi Eric,

I have all your review comments handled except this one. It's looking like it's
more difficult than just setting the exit_signal=-1, so I wanted to check that
I understood you.

Here is what I'm currently looking at:

1. I can't just set args->exit_signal to -1, because we end up with a task_struct
that's partially setup like a CLONE_THREAD task. What happens is copy_process
will set the task's exit_signal to -1 and then thread_group_leader() will return
false. When code like the thread_group_leader check in copy_process runs, we will
then go down the CLONE_THREAD paths which are not setup and hit crashes.

We would need changes like the following which does not crash anymore but is not
correct for many reasons. I am just posting this code as an example of the issue
I am hitting.

@@ -1637,11 +1637,13 @@ static void posix_cpu_timers_init_group(struct signal_struct *sig)
 	posix_cputimers_group_init(pct, cpu_limit);
 }
 
-static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
+static int copy_signal(unsigned long clone_flags, struct task_struct *tsk,
+		       struct kernel_clone_args *args)
 {
 	struct signal_struct *sig;
 
-	if (clone_flags & CLONE_THREAD)
+	if (clone_flags & CLONE_THREAD || args->exit_signal == -1)
 		return 0;
 
 	sig = kmem_cache_zalloc(signal_cachep, GFP_KERNEL);
@@ -2194,7 +2244,7 @@ static __latent_entropy struct task_struct *copy_process(
 	retval = copy_sighand(clone_flags, p);
 	if (retval)
 		goto bad_fork_cleanup_fs;
-	retval = copy_signal(clone_flags, p);
+	retval = copy_signal(clone_flags, p, args);
 	if (retval)
 		goto bad_fork_cleanup_sighand;
 	retval = copy_mm(clone_flags, p);
@@ -2277,6 +2327,9 @@ static __latent_entropy struct task_struct *copy_process(
 	if (clone_flags & CLONE_THREAD) {
 		p->group_leader = current->group_leader;
 		p->tgid = current->tgid;
+	} else if (args->exit_signal == -1) {
+		p->group_leader = current->group_leader;
+		p->tgid = p->pid;
 	} else {
 		p->group_leader = p;
 		p->tgid = p->pid;


2. Instead of #1, I could add some code where we just set
task_struct->exit_signal to -1. We could do this twords the end of copy_process
or after it has returned, but before we do do_exit. However, hat will have similar
issues as #1 during the exit handling.

For example, __exit_signal will call thread_group_leader which would return false.
__unhash_process would then not detach the pid and we would later hit crashes due
to the task_struct being freed already. I could add code like above to the exit related
code paths, but it gets messy like above.

3. I thought I could separate the leader detection from the exit signal by adding
a flag/field to kernel_clone_args and task_struct. But then I get to the point
where I just need a check for USER/VHOST_WORKER tasks in exit_notify which is
similar to the patch you didn't like where I added the check in do_notify_parent.
So I thought you might not like this approach.

Note:
We can't set our task's exit_signal to SIGCHLD and get autoreaped like suggested in
another mail. The original idea for the do_notify_parent was we wanted the behavior
that kthreads have where they get autoreaped on exit. kthreads get autoreaped there
because the threadd task that is the parent ignores all signals and so we hit the
parent SIG_IGN check:

        psig = tsk->parent->sighand;
        spin_lock_irqsave(&psig->siglock, flags);
        if (!tsk->ptrace && sig == SIGCHLD &&
            (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN ||
             (psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT))) {

Our parent, the qemu task, does not ignore SIGCHLD and so will not hit the code above.

4. Maybe I am going in the wrong direction and we need kthreads. I could add a:

if (!is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)))
	inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);

to vhost.c or to kthread.c when some new arg is passed in.


What do you think?
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization