Oleg Nesterov <oleg@xxxxxxxxxx> writes: > On 09/14, Jeff Layton wrote: >> >> POSIX mandates that open fds and their associated file locks should be >> preserved across an execve. This works, unless the process is >> multithreaded at the time that execve is called. >> >> In that case, we'll end up unsharing the files_struct but the locks will >> still have their fl_owner set to the address of the old one. Eventually, >> when the other threads die and the last reference to the old >> files_struct is put, any POSIX locks get torn down since it looks like >> a close occurred on them. >> >> The result is that all of your open files will be intact with none of >> the locks you held before execve. The simple answer to this is "use OFD >> locks", but this is a nasty surprise and it violates the spec. >> >> Fix this by doing unshare_files later during exec, > > See my reply to 1/3... if we can forget about the races with get_files_struct() > we can probably make a much simpler patch, plus we do not need 2/2, afaics. > > What I really can't understand is why we need to _change_ current->files > early in do_execve(). > > IOW. Lets ignore do_close_on_exec(), lets ignore the fact that unshare_fd() > can fail and thus it makes sense to call it before point-of-no-return. > > Any other reason why we can't simply call unshare_files() at the end of > __do_execve_file() on success? The reason we call we call unshare_files is in case the files are shared with another process. AKA old style linux threads, or someone being clever. In that case we need a private copy of files for close on exec because we should not close the files of the other process that has not called exec. The only reason for calling unshare_files before the point of no return is so that we can get a good error message to the calling process if unshare_files fails. Given that "files->count > 1" should only exist in rare and crazy cases. I expect we can legitimately have exec fail hard if we -ENOMEM in that case and kill the calling process. AKA it would be reasonable to move unshare_files to just above do_close_on_exec in flush_old_exec. We could further make the unshare_files not return displaced and just drop it. Thinking about Jeff's version already by necessity places unshare_files after de_thread. So it is already after the point of no return. So there really is no point in getting trying hard with displaced files. Eric