There is a rare case where current's nsproxy might be NULL but we are required to check for credentials and capabilities. It sometimes happens during an exit_group() syscall while destroying user's session (logging out). My understanding is that while we have to lock the task to get task's nsproxy and check whether it's NULL, for the 'current' we don't have to and it's expected not to be NULL. There is a code in the kernel currently that does current->nsproxy->user_ns without any checks. And include/linux/nsproxy.h confirms that: 2. when accessing (i.e. reading) current task's namespaces - no precautions should be taken - just dereference the pointers There seem to be no crash currently because of this, but with accessing nsproxy from LSM hooks there is. This is the backtrace: 0 smk_tskacc (task=0xffff88003b0b92e0, obj_known=0x2 <irq_stack_union+2>, mode=2, a=0xffff88003be53dd8) at security/smack/smack_access.c:261 1 0xffffffff8130e2aa in smk_curacc (obj_known=<optimized out>, mode=<optimized out>, a=<optimized out>) at security/smack/smack_access.c:318 2 0xffffffff8130a50d in smack_task_kill (p=0xffff88003b0b92e0, info=<optimized out>, sig=<optimized out>, secid=<optimized out>) at security/smack/smack_lsm.c:2071 3 0xffffffff812ea4f6 in security_task_kill (p=<optimized out>, info=<optimized out>, sig=<optimized out>, secid=<optimized out>) at security/security.c:952 4 0xffffffff8109ac80 in check_kill_permission (sig=15, info=0x0 <irq_stack_union>, t=0xffff88003b0b8000) at kernel/signal.c:796 5 0xffffffff8109d3ab in group_send_sig_info (sig=15, info=0x0 <irq_stack_union>, p=0xffff88003b0b8000) at kernel/signal.c:1296 6 0xffffffff8108e527 in forget_original_parent (father=<optimized out>) at kernel/exit.c:575 7 exit_notify (group_dead=<optimized out>, tsk=<optimized out>) at kernel/exit.c:606 8 do_exit (code=<optimized out>) at kernel/exit.c:775 9 0xffffffff8108ec0f in do_group_exit (exit_code=0) at kernel/exit.c:891 10 0xffffffff8108ec84 in SYSC_exit_group (error_code=<optimized out>) at kernel/exit.c:902 11 SyS_exit_group (error_code=<optimized out>) at kernel/exit.c:900 This backtrace clearly shows that there is an LSM hook task_kill() that happens during an exit_group() syscall and that this happens after exit_task_namespaces(). LSM hooks with namespaces might need nsproxy to be able to check for capabilities. At this point this is impossible. The current's nsproxy is already NULL/destroyed. This is the case because exit_task_namespaces() is called before the exit_notify() where all of the above happens. This patch changes their order. Signed-off-by: Lukasz Pawelczyk <l.pawelczyk@xxxxxxxxxxx> --- kernel/exit.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/exit.c b/kernel/exit.c index 22fcc05..da1bb18 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -742,7 +742,6 @@ void do_exit(long code) exit_fs(tsk); if (group_dead) disassociate_ctty(1); - exit_task_namespaces(tsk); exit_task_work(tsk); exit_thread(); @@ -763,6 +762,13 @@ void do_exit(long code) TASKS_RCU(tasks_rcu_i = __srcu_read_lock(&tasks_rcu_exit_srcu)); exit_notify(tsk, group_dead); + + /* + * This should be after all things that potentially require + * process's namespaces (e.g. capability checks). + */ + exit_task_namespaces(tsk); + proc_exit_connector(tsk); #ifdef CONFIG_NUMA task_lock(tsk); -- 2.1.0 _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers