On Tue, May 05, 2020 at 02:45:33PM -0500, Eric W. Biederman wrote: > > The current idiom for the callers is: > > flush_old_exec(bprm); > set_personality(...); > setup_new_exec(bprm); > > In 2010 Linus split flush_old_exec into flush_old_exec and > setup_new_exec. With the intention that setup_new_exec be what is > called after the processes new personality is set. > > Move the code that doesn't depend upon the personality from > setup_new_exec into flush_old_exec. This is to facilitate future > changes by having as much code together in one function as possible. Er, I *think* this is okay, but I have some questions below which maybe you already investigated (and should perhaps get called out in the changelog). > > Ref: 221af7f87b97 ("Split 'flush_old_exec' into two functions") > Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> > --- > fs/exec.c | 85 ++++++++++++++++++++++++++++--------------------------- > 1 file changed, 44 insertions(+), 41 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 8c3abafb9bb1..0eff20558735 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1359,39 +1359,7 @@ int flush_old_exec(struct linux_binprm * bprm) > * undergoing exec(2). > */ > do_close_on_exec(me->files); > - return 0; > - > -out_unlock: > - mutex_unlock(&me->signal->exec_update_mutex); > -out: > - return retval; > -} > -EXPORT_SYMBOL(flush_old_exec); > - > -void would_dump(struct linux_binprm *bprm, struct file *file) > -{ > - struct inode *inode = file_inode(file); > - if (inode_permission(inode, MAY_READ) < 0) { > - struct user_namespace *old, *user_ns; > - bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP; > - > - /* Ensure mm->user_ns contains the executable */ > - user_ns = old = bprm->mm->user_ns; > - while ((user_ns != &init_user_ns) && > - !privileged_wrt_inode_uidgid(user_ns, inode)) > - user_ns = user_ns->parent; > > - if (old != user_ns) { > - bprm->mm->user_ns = get_user_ns(user_ns); > - put_user_ns(old); > - } > - } > -} > -EXPORT_SYMBOL(would_dump); > - > -void setup_new_exec(struct linux_binprm * bprm) > -{ > - struct task_struct *me = current; > /* > * Once here, prepare_binrpm() will not be called any more, so > * the final state of setuid/setgid/fscaps can be merged into the > @@ -1414,8 +1382,6 @@ void setup_new_exec(struct linux_binprm * bprm) > bprm->rlim_stack.rlim_cur = _STK_LIM; > } > > - arch_pick_mmap_layout(me->mm, &bprm->rlim_stack); > - > me->sas_ss_sp = me->sas_ss_size = 0; > > /* > @@ -1430,16 +1396,9 @@ void setup_new_exec(struct linux_binprm * bprm) > else > set_dumpable(current->mm, SUID_DUMP_USER); > > - arch_setup_new_exec(); > perf_event_exec(); What is perf expecting to be able to examine at this point? Does it want a view of things after arch_setup_new_exec()? (i.e. "final" TIF flags, mmap layout, etc.) From what I can, the answer is "no, it's just resetting counters", so I think this is fine. Maybe double-check with Steve? > __set_task_comm(me, kbasename(bprm->filename), true); > > - /* Set the new mm task size. We have to do that late because it may > - * depend on TIF_32BIT which is only updated in flush_thread() on > - * some architectures like powerpc > - */ > - me->mm->task_size = TASK_SIZE; > - > /* An exec changes our domain. We are no longer part of the thread > group */ > WRITE_ONCE(me->self_exec_id, me->self_exec_id + 1); > @@ -1467,6 +1426,50 @@ void setup_new_exec(struct linux_binprm * bprm) > * credentials; any time after this it may be unlocked. > */ > security_bprm_committed_creds(bprm); Similarly for the LSM hook: is it expecting a post-arch-setup view? I don't see anything looking at task_size, TIF flags, or anything else; they seem to be just cleaning up from the old process being replaced, so against, I think this is okay. Not visible in this patch, the following things how happen earlier, which I feel should maybe get called out in the changelog, with, perhaps, better justification than what I've got here: bprm->secureexec set/check (looks safe, since it depends on prepare_binprm()'s security_bprm_set_creds(). rlim_stack.rlim_cur setting (safe, just needs to happen before arch_pick_mmap_layout()) dumpable() check (looks safe, BINPRM_FLAGS_ENFORCE_NONDUMP depends on much earlier would_dump(), and uid/gid depend on earlier calls to prepare_binprm()'s bprm_fill_uid()) __set_task_comm (looks safe, just dealing with the task name...) self_exec_id bump (looks safe, but I think -- it's still after uid setting) flush_signal_handlers() (looks safe -- nothing appears to depend on mm nor personality) > + return 0; > + > +out_unlock: > + mutex_unlock(&me->signal->exec_update_mutex); > +out: > + return retval; > +} > +EXPORT_SYMBOL(flush_old_exec); > + > +void would_dump(struct linux_binprm *bprm, struct file *file) > +{ > + struct inode *inode = file_inode(file); > + if (inode_permission(inode, MAY_READ) < 0) { > + struct user_namespace *old, *user_ns; > + bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP; > + > + /* Ensure mm->user_ns contains the executable */ > + user_ns = old = bprm->mm->user_ns; > + while ((user_ns != &init_user_ns) && > + !privileged_wrt_inode_uidgid(user_ns, inode)) > + user_ns = user_ns->parent; > + > + if (old != user_ns) { > + bprm->mm->user_ns = get_user_ns(user_ns); > + put_user_ns(old); > + } > + } > +} > +EXPORT_SYMBOL(would_dump); The diff helpfully decided this moved would_dump(). ;) Is it worth maybe just moviing it explicitly above flush_old_exec() to avoid this churn? I dunno. > + > +void setup_new_exec(struct linux_binprm * bprm) > +{ > + /* Setup things that can depend upon the personality */ Should this comment be above the function instead? > + struct task_struct *me = current; > + > + arch_pick_mmap_layout(me->mm, &bprm->rlim_stack); > + > + arch_setup_new_exec(); > + > + /* Set the new mm task size. We have to do that late because it may > + * depend on TIF_32BIT which is only updated in flush_thread() on > + * some architectures like powerpc > + */ > + me->mm->task_size = TASK_SIZE; > mutex_unlock(&me->signal->exec_update_mutex); > mutex_unlock(&me->signal->cred_guard_mutex); > } > -- > 2.20.1 > So, as I say, I *think* this is okay, but I always get suspicious about reordering things in execve(). ;) So, with a bit larger changelog discussing what's moving "earlier", I think this looks good: Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx> -- Kees Cook