Lukasz Pawelczyk <l.pawelczyk@xxxxxxxxxxx> writes: > There is a rare case where current's nsproxy might be NULL but we are > required to check for credentials and capabilities. It sometimes happens > during an exit_group() syscall while destroying user's session (logging > out). > > My understanding is that while we have to lock the task to get task's > nsproxy and check whether it's NULL, for the 'current' we don't have to > and it's expected not to be NULL. There is a code in the kernel > currently that does current->nsproxy->user_ns without any checks. > And include/linux/nsproxy.h confirms that: > > 2. when accessing (i.e. reading) current task's namespaces - no > precautions should be taken - just dereference the pointers > > There seem to be no crash currently because of this, but with accessing > nsproxy from LSM hooks there is. This is the backtrace: > > 0 smk_tskacc (task=0xffff88003b0b92e0, obj_known=0x2 <irq_stack_union+2>, mode=2, a=0xffff88003be53dd8) at security/smack/smack_access.c:261 > 1 0xffffffff8130e2aa in smk_curacc (obj_known=<optimized out>, mode=<optimized out>, a=<optimized out>) at security/smack/smack_access.c:318 > 2 0xffffffff8130a50d in smack_task_kill (p=0xffff88003b0b92e0, info=<optimized out>, sig=<optimized out>, secid=<optimized out>) at security/smack/smack_lsm.c:2071 > 3 0xffffffff812ea4f6 in security_task_kill (p=<optimized out>, info=<optimized out>, sig=<optimized out>, secid=<optimized out>) at security/security.c:952 > 4 0xffffffff8109ac80 in check_kill_permission (sig=15, info=0x0 <irq_stack_union>, t=0xffff88003b0b8000) at kernel/signal.c:796 > 5 0xffffffff8109d3ab in group_send_sig_info (sig=15, info=0x0 <irq_stack_union>, p=0xffff88003b0b8000) at kernel/signal.c:1296 > 6 0xffffffff8108e527 in forget_original_parent (father=<optimized out>) at kernel/exit.c:575 > 7 exit_notify (group_dead=<optimized out>, tsk=<optimized out>) at kernel/exit.c:606 > 8 do_exit (code=<optimized out>) at kernel/exit.c:775 > 9 0xffffffff8108ec0f in do_group_exit (exit_code=0) at kernel/exit.c:891 > 10 0xffffffff8108ec84 in SYSC_exit_group (error_code=<optimized out>) at kernel/exit.c:902 > 11 SyS_exit_group (error_code=<optimized out>) at kernel/exit.c:900 > > This backtrace clearly shows that there is an LSM hook task_kill() that > happens during an exit_group() syscall and that this happens after > exit_task_namespaces(). LSM hooks with namespaces might need nsproxy to > be able to check for capabilities. At this point this is impossible. The > current's nsproxy is already NULL/destroyed. > > This is the case because exit_task_namespaces() is called before the > exit_notify() where all of the above happens. This patch changes their > order. Nacked-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> current->nsproxy->user_ns does not exist, and changing where exit_task_namespaces is fragile and I am really not interested in messing with it right now, to solve a problem that does not exist. > > Signed-off-by: Lukasz Pawelczyk <l.pawelczyk@xxxxxxxxxxx> > --- > kernel/exit.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/kernel/exit.c b/kernel/exit.c > index 22fcc05..da1bb18 100644 > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -742,7 +742,6 @@ void do_exit(long code) > exit_fs(tsk); > if (group_dead) > disassociate_ctty(1); > - exit_task_namespaces(tsk); > exit_task_work(tsk); > exit_thread(); > > @@ -763,6 +762,13 @@ void do_exit(long code) > > TASKS_RCU(tasks_rcu_i = __srcu_read_lock(&tasks_rcu_exit_srcu)); > exit_notify(tsk, group_dead); > + > + /* > + * This should be after all things that potentially require > + * process's namespaces (e.g. capability checks). > + */ > + exit_task_namespaces(tsk); > + > proc_exit_connector(tsk); > #ifdef CONFIG_NUMA > task_lock(tsk); _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers