Oleg Nesterov <oleg@xxxxxxxxxx> writes: > On 11/04, Oleg Nesterov wrote: >> >> On 11/04, Oleg Nesterov wrote: >> > >> > On 11/04, Eric W. Biederman wrote: >> > > >> > > The following mostly correct patch modifies zap_other_threads in >> > > the case of a de_thread to not wait for zombies to be reaped. The only >> > > case that cares is ptrace (as threads are self reaping). So I don't >> > > think this will cause any problems except removing the strace -f race. >> > >> > From my previous email: >> > >> > So the only plan I currently have is change de_thread() to wait until >> > other threads pass exit_notify() or even exit_signals(), but I don't >> > like this. >> > >> > And yes, I don't like this, but perhaps this is what we should do. >> > >> > The patch is incomplete and racy (afaics), and the SIGNAL_GROUP_EXIT >> > checks doesn't look right, but off course technically this change should >> > be simple enough. >> > >> > But not that simple. Just for example, the exiting sub-threads should >> > not run with ->group_leader pointing to nowhere, in case it was reaped >> > by de_thread. >> >> Not to mention other potential problems outside of ptrace/exec. For example >> userns_install() can fail after mt-exec even without ptrace, simply because >> thread_group_empty() can be false. Sure, easy to fix, and probably _install() >> should use signal->live anyway, but still. >> >> And I didn't mention the fun with sighand unsharing. We simply can't do this >> until all sub-threads go away. IOW, your patch breaks the usage of ->siglock. >> The execing thread and the zombie threads will use different locks to, say, >> remove the task from thread-group. Again, this is fixable, but not that >> simple. >> >> > And we have another problem with PTRACE_EVENT_EXIT which can lead to the >> > same deadlock. Unfortunately, the semantics of PTRACE_EVENT_EXIT was never >> > defined. But this change will add the user-visible change. >> > >> > And if we add the user-visible changes, then perhaps we could simply untrace >> > the traced sub-threads on exec. This change is simple, we do not even need >> > to touch exec/de_thread, we could just change exit_notify() to ignore ->ptrace >> > if exec is in progress. But I'm afraid we can't do this. > > So I was thinking about something like below. Untested, probably buggy/incomplete > too, but hopefully can work. > > flush_old_exec() calls the new kill_sub_threads() helper which waits until > all the sub-threads pass exit_notify(). > > de_thread() is called after install_exec_creds(), it is simplified and waits > for thread_group_empty() without cred_guard_mutex. > > But again, I do not really like this, and we need to do something with > PTRACE_EVENT_EXIT anyway, this needs another/separate change. User-visible. > > And I disagree that this has nothing to do with cred_guard_mutex. And in any > case we should narrow its scope in do_execve() path. Why do we take it so early? > Why do we need to do, say, copy_strings() with this lock held? The original > motivation for this has gone, acct_arg_size() can work just fine even if > multiple threads call sys_execve(). > > I'll try to discuss the possible changes in LSM hooks with Jann, I still think > that this is what we actually need to do. At least try to do, possibly this is > too complicated. The code below looks interesting. Am I wrong or do we get the PTRACE_EVENT_EXIT case wrong for the multi-threaded exec's when we don't exec from the primary thread? AKA I think the primary thread will pass through ptrace_event(PTRACE_EVENT_EXIT) before we steal it's thread and likewise the thread that calls exec won't pass through ptrace_event(PTRACE_EVENT_EXIT). Which I suspect gives us quite a bit of lattitude to skip that notification entirely without notifying userspace. We need to test to be certain that both gdb and strace can cope. But I do suspect we could just throw ptrace_event(PTRACE_EVENT_EXIT) out in the case of a multi-threaded exec and no one would care. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html