Re: [PATCH 04/10] exit: Stop poorly open coding do_task_dead in make_task_dead

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) · Wed, 05 Jan 2022 16:33:19 -0600

Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:

> On Wed, Dec 08, 2021 at 02:25:26PM -0600, Eric W. Biederman wrote:
>> When the kernel detects it is oops or otherwise force killing a task
>> while it exits the code poorly attempts to permanently stop the task
>> from scheduling.
>> 
>> I say poorly because it is possible for a task in TASK_UINTERRUPTIBLE
>> to be woken up.
>> 
>> As it makes no sense for the task to continue call do_task_dead
>> instead which actually does the work and permanently removes the task
>> from the scheduler.  Guaranteeing the task will never be woken
>> up again.
>
> NAK.  This is not all do_task_dead() leads to - see what finish_task_switch()
> does upon seeing TASK_DEAD:
>                 /* Task is done with its stack. */
> 		put_task_stack(prev);
> 		put_task_struct_rcu_user(prev);
>
>
> Now take a look at the comment just before that check for PF_EXITING -
> the point is to leave the task leaked, rather than proceeding with
> freeing the sucker.
>
> We are not going through the normal "turn zombie" motions, including
> waking wait(2) callers up, etc.  Going ahead and freeing it could
> fuck the things up quite badly.

I believe I was thinking this task won't be reaped because release_task
can never be called.  Which I admit depending on where we oops in
do_exit is not strictly true.

We can guarantee the leak with:

	tsk->exit_state = EXIT_DEAD;
        refcount_inc(&tsk->rcu_users);

It just feels wrong to me to have something dead and broken sticking around
the scheduler queue.  Especially as something could come along and wake
it up and then what do we do.

Hmm.  I think we want that tsk->exit_state = EXIT_DEAD regardless to
prevent it from being reaped and possibly causing more harm.

Eric