On 3/4/20 5:33 PM, Eric W. Biederman wrote: > Bernd Edlinger <bernd.edlinger@xxxxxxxxxx> writes: > >> On 3/3/20 9:08 PM, Eric W. Biederman wrote: >>> Bernd Edlinger <bernd.edlinger@xxxxxxxxxx> writes: >>> >>>> On 3/3/20 4:18 PM, Eric W. Biederman wrote: >>>>> Bernd Edlinger <bernd.edlinger@xxxxxxxxxx> writes: >>>>>> diff --git a/tools/testing/selftests/ptrace/vmaccess.c b/tools/testing/selftests/ptrace/vmaccess.c >>>>>> new file mode 100644 >>>>>> index 0000000..6d8a048 >>>>>> --- /dev/null >>>>>> +++ b/tools/testing/selftests/ptrace/vmaccess.c >>>>>> @@ -0,0 +1,66 @@ >>>>>> +// SPDX-License-Identifier: GPL-2.0+ >>>>>> +/* >>>>>> + * Copyright (c) 2020 Bernd Edlinger <bernd.edlinger@xxxxxxxxxx> >>>>>> + * All rights reserved. >>>>>> + * >>>>>> + * Check whether /proc/$pid/mem can be accessed without causing deadlocks >>>>>> + * when de_thread is blocked with ->cred_guard_mutex held. >>>>>> + */ >>>>>> + >>>>>> +#include "../kselftest_harness.h" >>>>>> +#include <stdio.h> >>>>>> +#include <fcntl.h> >>>>>> +#include <pthread.h> >>>>>> +#include <signal.h> >>>>>> +#include <unistd.h> >>>>>> +#include <sys/ptrace.h> >>>>>> + >>>>>> +static void *thread(void *arg) >>>>>> +{ >>>>>> + ptrace(PTRACE_TRACEME, 0, 0L, 0L); >>>>>> + return NULL; >>>>>> +} >>>>>> + >>>>>> +TEST(vmaccess) >>>>>> +{ >>>>>> + int f, pid = fork(); >>>>>> + char mm[64]; >>>>>> + >>>>>> + if (!pid) { >>>>>> + pthread_t pt; >>>>>> + >>>>>> + pthread_create(&pt, NULL, thread, NULL); >>>>>> + pthread_join(pt, NULL); >>>>>> + execlp("true", "true", NULL); >>>>>> + } >>>>>> + >>>>>> + sleep(1); >>>>>> + sprintf(mm, "/proc/%d/mem", pid); >>>>>> + f = open(mm, O_RDONLY); >>>>>> + ASSERT_LE(0, f); >>>>>> + close(f); >>>>>> + f = kill(pid, SIGCONT); >>>>>> + ASSERT_EQ(0, f); >>>>>> +} >>>>>> + >>>>>> +TEST(attach) >>>>>> +{ >>>>>> + int f, pid = fork(); >>>>>> + >>>>>> + if (!pid) { >>>>>> + pthread_t pt; >>>>>> + >>>>>> + pthread_create(&pt, NULL, thread, NULL); >>>>>> + pthread_join(pt, NULL); >>>>>> + execlp("true", "true", NULL); >>>>>> + } >>>>>> + >>>>>> + sleep(1); >>>>>> + f = ptrace(PTRACE_ATTACH, pid, 0L, 0L); >>>>> >>>>> To be meaningful this code needs to learn to loop when >>>>> ptrace returns -EAGAIN. >>>>> >>>>> Because that is pretty much what any self respecting user space >>>>> process will do. >>>>> >>>>> At which point I am not certain we can say that the behavior has >>>>> sufficiently improved not to be a deadlock. >>>>> >>>> >>>> In this special dead-duck test it won't work, but it would >>>> still be lots more transparent what is going on, since previously >>>> you had two zombie process, and no way to even output debug >>>> messages, which also all self respecting user space processes >>>> should do. >>> >>> Agreed it is more transparent. So if you are going to deadlock >>> it is better. >>> >>> My previous proposal (which I admit is more work to implement) would >>> actually allow succeeding in this case and so it would not be subject to >>> a dead lock (even via -EGAIN) at this point. >>> >>>> So yes, I can at least give a good example and re-try it several >>>> times together with wait4 which a tracer is expected to do. >>> >>> Thank you, >>> >>> Eric >>> >> >> Okay, I think it can be done with minimal API changes, >> but it needs two mutexes, one that guards the execve, >> and one that guards only the credentials. >> >> If no traced sibling thread exists, the mutexes are used this way: >> lock(exec_guard_mutex) >> cred_locked_in_execve = true; >> de_thread() >> lock(cred_guard_mutex) >> unlock(cred_guard_mutex) >> cred_locked_in_execve = false; >> unlock(exec_guard_mutex) >> >> so effectively no API change at all. >> >> If a traced sibling thread exists, the mutexes are used differently: >> lock(exec_guard_mutex) >> cred_locked_in_execve = true; >> unlock(exec_guard_mutex) >> de_thread() >> lock(cred_guard_mutex) >> unlock(cred_guard_mutex) >> lock(exec_guard_mutex) >> cred_locked_in_execve = false; >> unlock(exec_guard_mutex) >> >> Only the case changes that would deadlock anyway. > > > Let me propose a slight alternative that I think sets us up for long > term success. > > Leave cred_guard_mutex as is, but declare it undesirable. The > cred_guard_mutex as designed really is something we should get rid of. > As it it can sleep over several different userspace accesses. The > copying of the exec arguments is technically as prone to deadlock as the > ptrace case. > > Add a new mutex with a better name perhaps "exec_change_mutex" that is > used to guard the changes that exec makes to a process. > > Then we gradually shift all the cred_guard_mutex users over to the new > mutex. AKA one patch per user of cred_guard_mutex. At each patch that > shifts things over we will have the opportunity to review the code to > see that there no funny dependencies that were missed. > > I will sign up for working on the no_new_privs and ptrace_attach cases > as I think I can make those happen. Especially no_new_privs. > > Getting the easier cases will resolve your issues and put things on a > better footing. > > Eric > Okay, however I think we will need two mutexes in the long term. So currently I have reduced the cred_guard_mutex to protect just the loading of the executable code in the process vm, since that is what works for vm_access, (one of the test cases). And another mutex that protects the whole execve function, that is need for ptrace, (and seccomp). But I have only a test case for ptrace. If I understand that right, I should not recycle cred_guard_mutex but leave it as is, and create two additional mutexes which will take over step by step. Sounds reasonable, indeed. I will send an update (v6) what I have right now, but just for information, so you can see how my minimal API-Change approach works. Bernd.