Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes: > On Wed, Dec 08, 2021 at 02:25:26PM -0600, Eric W. Biederman wrote: >> When the kernel detects it is oops or otherwise force killing a task >> while it exits the code poorly attempts to permanently stop the task >> from scheduling. >> >> I say poorly because it is possible for a task in TASK_UINTERRUPTIBLE >> to be woken up. >> >> As it makes no sense for the task to continue call do_task_dead >> instead which actually does the work and permanently removes the task >> from the scheduler. Guaranteeing the task will never be woken >> up again. > > NAK. This is not all do_task_dead() leads to - see what finish_task_switch() > does upon seeing TASK_DEAD: > /* Task is done with its stack. */ > put_task_stack(prev); > put_task_struct_rcu_user(prev); > > > Now take a look at the comment just before that check for PF_EXITING - > the point is to leave the task leaked, rather than proceeding with > freeing the sucker. > > We are not going through the normal "turn zombie" motions, including > waking wait(2) callers up, etc. Going ahead and freeing it could > fuck the things up quite badly. I believe I was thinking this task won't be reaped because release_task can never be called. Which I admit depending on where we oops in do_exit is not strictly true. We can guarantee the leak with: tsk->exit_state = EXIT_DEAD; refcount_inc(&tsk->rcu_users); It just feels wrong to me to have something dead and broken sticking around the scheduler queue. Especially as something could come along and wake it up and then what do we do. Hmm. I think we want that tsk->exit_state = EXIT_DEAD regardless to prevent it from being reaped and possibly causing more harm. Eric