Jeff Layton <jlayton@xxxxxxxxxx> writes: > POSIX mandates that open fds and their associated file locks should be > preserved across an execve. This works, unless the process is > multithreaded at the time that execve is called. > > In that case, we'll end up unsharing the files_struct but the locks will > still have their fl_owner set to the address of the old one. Eventually, > when the other threads die and the last reference to the old > files_struct is put, any POSIX locks get torn down since it looks like > a close occurred on them. > > The result is that all of your open files will be intact with none of > the locks you held before execve. The simple answer to this is "use OFD > locks", but this is a nasty surprise and it violates the spec. > > Fix this by doing unshare_files later during exec, after we've already > killed off the other threads in the process. This helps ensure that we > only unshare the files_struct during exec when it is truly shared with > other processes. > > Note that because the unshare_files call is now done just after > de_thread, we need a mechanism to pass the displaced files_struct back > up to __do_execve_file. This is done via a new displaced_files field > inside the linux_binprm. Actually because unshare_files happens after de_thread (the point of no return) we don't need to do anything with displaced files. A failing exec will clear brpm->mm, and search_binary_handler will convert any error into force_sigsegv. > Cc: Eric W. Biederman <ebiederm@xxxxxxxxxxxx> > Reported-by: Daniel P. Berrangé <berrange@xxxxxxxxxx> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > --- > fs/exec.c | 11 ++++++----- > include/linux/binfmts.h | 1 + > 2 files changed, 7 insertions(+), 5 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index c8a68481d7eb..b4a7e659d908 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1262,6 +1262,10 @@ int flush_old_exec(struct linux_binprm * bprm) > if (retval) > goto out; > > + retval = unshare_files(&bprm->displaced_files); > + if (retval) > + goto out; > + We can place this just above do_close_on_exec. And simply modify unshare_files to instead of returning displaced simply put it. We gain nothing be returning displaced_files as we are already past the point of no return, and userspace will get an unblockable SIGSEGV and exit. > /* > * Must be called _before_ exec_mmap() as bprm->mm is > * not visibile until then. This also enables the update > @@ -1713,7 +1717,7 @@ static int __do_execve_file(int fd, struct filename *filename, > { > char *pathbuf = NULL; > struct linux_binprm *bprm; > - struct files_struct *displaced; > + struct files_struct *displaced = NULL; > int retval; > > if (IS_ERR(filename)) > @@ -1735,10 +1739,6 @@ static int __do_execve_file(int fd, struct filename *filename, > * further execve() calls fail. */ > current->flags &= ~PF_NPROC_EXCEEDED; > > - retval = unshare_files(&displaced); > - if (retval) > - goto out_ret; > - > retval = -ENOMEM; > bprm = kzalloc(sizeof(*bprm), GFP_KERNEL); > if (!bprm) > @@ -1817,6 +1817,7 @@ static int __do_execve_file(int fd, struct filename *filename, > would_dump(bprm, bprm->file); > > retval = exec_binprm(bprm); > + displaced = bprm->displaced_files; > if (retval < 0) > goto out; > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > index c05f24fac4f6..d7ec384bb1b0 100644 > --- a/include/linux/binfmts.h > +++ b/include/linux/binfmts.h > @@ -49,6 +49,7 @@ struct linux_binprm { > unsigned int taso:1; > #endif > unsigned int recursion_depth; /* only for search_binary_handler() */ > + struct files_struct * displaced_files; > struct file * file; > struct cred *cred; /* new credentials */ > int unsafe; /* how unsafe this exec is (mask of LSM_UNSAFE_*) */