On Sun, 2020-03-08 at 16:38 -0500, Eric W. Biederman wrote: > The cred_guard_mutex is problematic. The cred_guard_mutex is held > over the userspace accesses as the arguments from userspace are read. > The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other > threads are killed. The cred_guard_mutex is held over > "put_user(0, tsk->clear_child_tid)" in exit_mm(). > > Any of those can result in deadlock, as the cred_guard_mutex is held > over a possible indefinite userspace waits for userspace. > > Add exec_update_mutex that is only held over exec updating process > with the new contents of exec, so that code that needs not to be > confused by exec changing the mm and the cred in ways that can not > happen during ordinary execution of a process. > > The plan is to switch the users of cred_guard_mutex to > exec_udpate_mutex one by one. This lets us move forward while still > being careful and not introducing any regressions. > > Link: https://lore.kernel.org/lkml/20160921152946.GA24210@xxxxxxxxxxxxxx/ > Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@xxxxxxxxxx/ > Link: https://lore.kernel.org/lkml/20160923095031.GA14923@xxxxxxxxxx/ > Link: https://lore.kernel.org/lkml/20170213141452.GA30203@xxxxxxxxxx/ > Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") > Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") > Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> This patch will trigger a warning during boot, [ 19.707214][ T1] pci 0035:01:00.0: enabling device (0545 -> 0547) [ 19.707287][ T1] EEH: Capable adapter found: recovery enabled. [ 19.732541][ T1] cpuidle-powernv: Default stop: psscr = 0x0000000000000330,mask=0x00000000003003ff [ 19.732567][ T1] cpuidle-powernv: Deepest stop: psscr = 0x0000000000300375,mask=0x00000000003003ff [ 19.732598][ T1] cpuidle-powernv: First stop level that may lose SPRs = 0x4 [ 19.732617][ T1] cpuidle-powernv: First stop level that may lose timebase = 0x10 [ 19.769784][ T1] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages [ 19.769810][ T1] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages [ 19.789344][ T718] [ 19.789367][ T718] ===================================== [ 19.789379][ T718] WARNING: bad unlock balance detected! [ 19.789393][ T718] 5.6.0-rc5-next-20200311+ #4 Not tainted [ 19.789414][ T718] ------------------------------------- [ 19.789426][ T718] kworker/u257:0/718 is trying to release lock (&sig- >exec_update_mutex) at: [ 19.789459][ T718] [<c0000000004c6770>] free_bprm+0xe0/0xf0 [ 19.789481][ T718] but there are no more locks to release! [ 19.789502][ T718] [ 19.789502][ T718] other info that might help us debug this: [ 19.789537][ T718] 1 lock held by kworker/u257:0/718: [ 19.789558][ T718] #0: c000001fa8842808 (&sig->cred_guard_mutex){+.+.}, at: __do_execve_file.isra.33+0x1b0/0xda0 [ 19.789611][ T718] [ 19.789611][ T718] stack backtrace: [ 19.789645][ T718] CPU: 8 PID: 718 Comm: kworker/u257:0 Not tainted 5.6.0- rc5-next-20200311+ #4 [ 19.789681][ T718] Call Trace: [ 19.789703][ T718] [c000000dad8cfa70] [c000000000979b40] dump_stack+0xf4/0x164 (unreliable) [ 19.789742][ T718] [c000000dad8cfac0] [c0000000001c1d78] print_unlock_imbalance_bug+0x118/0x140 [ 19.789780][ T718] [c000000dad8cfb40] [c0000000001ceaa0] lock_release+0x270/0x520 [ 19.789817][ T718] [c000000dad8cfbf0] [c0000000009a2898] __mutex_unlock_slowpath+0x68/0x400 [ 19.789854][ T718] [c000000dad8cfcc0] [c0000000004c6770] free_bprm+0xe0/0xf0 [ 19.789900][ T718] [c000000dad8cfcf0] [c0000000004c845c] __do_execve_file.isra.33+0x44c/0xda0 __do_execve_file at fs/exec.c:1904 [ 19.789938][ T718] [c000000dad8cfde0] [c0000000001391d8] call_usermodehelper_exec_async+0x218/0x250 [ 19.789977][ T718] [c000000dad8cfe20] [c00000000000b748] ret_from_kernel_thread+0x5c/0x74 > --- > fs/exec.c | 9 +++++++++ > include/linux/sched/signal.h | 9 ++++++++- > init/init_task.c | 1 + > kernel/fork.c | 1 + > 4 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/fs/exec.c b/fs/exec.c > index d820a7272a76..ffeebb1f167b 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1014,6 +1014,7 @@ static int exec_mmap(struct mm_struct *mm) > { > struct task_struct *tsk; > struct mm_struct *old_mm, *active_mm; > + int ret; > > /* Notify parent that we're no longer interested in the old VM */ > tsk = current; > @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) > return -EINTR; > } > } > + > + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); > + if (ret) > + return ret; > + > task_lock(tsk); > active_mm = tsk->active_mm; > membarrier_exec_mmap(mm); > @@ -1438,6 +1444,8 @@ static void free_bprm(struct linux_binprm *bprm) > { > free_arg_pages(bprm); > if (bprm->cred) { > + if (!bprm->mm) > + mutex_unlock(¤t->signal->exec_update_mutex); > mutex_unlock(¤t->signal->cred_guard_mutex); > abort_creds(bprm->cred); > } > @@ -1487,6 +1495,7 @@ void install_exec_creds(struct linux_binprm *bprm) > * credentials; any time after this it may be unlocked. > */ > security_bprm_committed_creds(bprm); > + mutex_unlock(¤t->signal->exec_update_mutex); > mutex_unlock(¤t->signal->cred_guard_mutex); > } > EXPORT_SYMBOL(install_exec_creds); > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h > index 88050259c466..a29df79540ce 100644 > --- a/include/linux/sched/signal.h > +++ b/include/linux/sched/signal.h > @@ -224,7 +224,14 @@ struct signal_struct { > > struct mutex cred_guard_mutex; /* guard against foreign influences on > * credential calculations > - * (notably. ptrace) */ > + * (notably. ptrace) > + * Deprecated do not use in new code. > + * Use exec_update_mutex instead. > + */ > + struct mutex exec_update_mutex; /* Held while task_struct is being > + * updated during exec, and may have > + * inconsistent permissions. > + */ > } __randomize_layout; > > /* > diff --git a/init/init_task.c b/init/init_task.c > index 9e5cbe5eab7b..bd403ed3e418 100644 > --- a/init/init_task.c > +++ b/init/init_task.c > @@ -26,6 +26,7 @@ static struct signal_struct init_signals = { > .multiprocess = HLIST_HEAD_INIT, > .rlim = INIT_RLIMITS, > .cred_guard_mutex = __MUTEX_INITIALIZER(init_signals.cred_guard_mutex), > + .exec_update_mutex = __MUTEX_INITIALIZER(init_signals.exec_update_mutex), > #ifdef CONFIG_POSIX_TIMERS > .posix_timers = LIST_HEAD_INIT(init_signals.posix_timers), > .cputimer = { > diff --git a/kernel/fork.c b/kernel/fork.c > index 60a1295f4384..12896a6ecee6 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1594,6 +1594,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk) > sig->oom_score_adj_min = current->signal->oom_score_adj_min; > > mutex_init(&sig->cred_guard_mutex); > + mutex_init(&sig->exec_update_mutex); > > return 0; > }