Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> writes: > On 12.03.2020 15:24, Eric W. Biederman wrote: >> Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> writes: >> >>> On 09.03.2020 00:38, Eric W. Biederman wrote: >>>> >>>> The cred_guard_mutex is problematic. The cred_guard_mutex is held >>>> over the userspace accesses as the arguments from userspace are read. >>>> The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other >>>> threads are killed. The cred_guard_mutex is held over >>>> "put_user(0, tsk->clear_child_tid)" in exit_mm(). >>>> >>>> Any of those can result in deadlock, as the cred_guard_mutex is held >>>> over a possible indefinite userspace waits for userspace. >>>> >>>> Add exec_update_mutex that is only held over exec updating process >>>> with the new contents of exec, so that code that needs not to be >>>> confused by exec changing the mm and the cred in ways that can not >>>> happen during ordinary execution of a process. >>>> >>>> The plan is to switch the users of cred_guard_mutex to >>>> exec_udpate_mutex one by one. This lets us move forward while still >>>> being careful and not introducing any regressions. >>>> >>>> Link: https://lore.kernel.org/lkml/20160921152946.GA24210@xxxxxxxxxxxxxx/ >>>> Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ >>>> Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@xxxxxxxxxx/ >>>> Link: https://lore.kernel.org/lkml/20160923095031.GA14923@xxxxxxxxxx/ >>>> Link: https://lore.kernel.org/lkml/20170213141452.GA30203@xxxxxxxxxx/ >>>> Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") >>>> Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") >>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> >>>> --- >>>> fs/exec.c | 9 +++++++++ >>>> include/linux/sched/signal.h | 9 ++++++++- >>>> init/init_task.c | 1 + >>>> kernel/fork.c | 1 + >>>> 4 files changed, 19 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/fs/exec.c b/fs/exec.c >>>> index d820a7272a76..ffeebb1f167b 100644 >>>> --- a/fs/exec.c >>>> +++ b/fs/exec.c >>>> @@ -1014,6 +1014,7 @@ static int exec_mmap(struct mm_struct *mm) >>>> { >>>> struct task_struct *tsk; >>>> struct mm_struct *old_mm, *active_mm; >>>> + int ret; >>>> >>>> /* Notify parent that we're no longer interested in the old VM */ >>>> tsk = current; >>>> @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) >>>> return -EINTR; >>>> } >>>> } >>>> + >>>> + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); >>>> + if (ret) >>>> + return ret; >>> >>> You missed old_mm->mmap_sem unlock. See here: >> >> Duh. Thank you. >> >> I actually need to switch the lock ordering here, and I haven't yet >> because my son was sick yesterday. > > There is some fundamental problem with your patch, since the below fires in 100% cases > on current linux-next: Thank you. I have just backed this out of linux-next for now because it is clearly flawed. You make some good points about the recursion. I will go back to the drawing board and see what I can work out. > [ 22.838717] kernel BUG at fs/exec.c:1474! > > diff --git a/fs/exec.c b/fs/exec.c > index 47582cd97f86..0f77f8c94905 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1470,8 +1470,10 @@ static void free_bprm(struct linux_binprm *bprm) > { > free_arg_pages(bprm); > if (bprm->cred) { > - if (!bprm->mm) > + if (!bprm->mm) { > + BUG_ON(!mutex_is_locked(¤t->signal->exec_update_mutex)); > mutex_unlock(¤t->signal->exec_update_mutex); > + } > mutex_unlock(¤t->signal->cred_guard_mutex); > abort_creds(bprm->cred); > } > @@ -1521,6 +1523,7 @@ void install_exec_creds(struct linux_binprm *bprm) > * credentials; any time after this it may be unlocked. > */ > security_bprm_committed_creds(bprm); > + BUG_ON(!mutex_is_locked(¤t->signal->exec_update_mutex)); > mutex_unlock(¤t->signal->exec_update_mutex); > mutex_unlock(¤t->signal->cred_guard_mutex); > } > > --------------------------------------------------------------------------------------------- > > First time the mutex is unlocked in: > > exec_binprm()->search_binary_handler()->.load_binary->install_exec_creds() > > Then exec_binprm()->search_binary_handler()->.load_binary->flush_old_exec() clears mm: > > bprm->mm = NULL; > > Second time the mutex is unlocked in free_bprm(): > > if (bprm->cred) { > if (!bprm->mm) > mutex_unlock(¤t->signal->exec_update_mutex); > > My opinion is we should not relay on side indicators like bprm->mm. Better you may > introduce struct linux_binprm::exec_update_mutex_is_locked. So the next person dealing > with this after you won't waste much time on diving into this. Also, if someone decides > to change the place, where bprm->mm is set into NULL, this person will bump into hell > of dependences between unrelated components like your newly introduced mutex. > > So, I'm strongly for *struct linux_binprm::exec_update_mutex_is_locked*, since this improves > modularity. Am I wrong or is that also a problem with cred_guard_mutex? Eric