Adding the containers list because that is the general place for these kinds of discussions. Cyrill Gorcunov <gorcunov@xxxxxxxxx> writes: > Hi Eric! A few days ago we've noticed that our zombie00 test case started > failing: https://ci.openvz.org/job/CRIU/view/All/job/CRIU-linux-next/406/console > --- > ======================== Run zdtm/static/zombie00 in h ========================= > Start test > ./zombie00 --pidfile=zombie00.pid --outfile=zombie00.out > Run criu dump > Run criu restore > Send the 15 signal to 30 > Wait for zdtm/static/zombie00(30) to die for 0.100000 > ################ Test zdtm/static/zombie00 FAIL at result check ################ > > I've narrowed problem down to commit > > | From ce99dd5fd5f600f9f4f0d37bb8847c1cb7c6e4fc Mon Sep 17 00:00:00 2001 > | From: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> > | Date: Thu, 13 Oct 2016 21:23:16 -0500 > | Subject: [PATCH] mm: Add a user_ns owner to mm_struct and fix > | ptrace_may_access > | > | During exec dumpable is cleared if the file that is being executed is > | not readable by the user executing the file. A bug in > | ptrace_may_access allows reading the file if the executable happens to > | enter into a subordinate user namespace (aka clone(CLONE_NEWUSER), > | unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER). > > and the reason is that the zombie tasks do not have task::mm and in resut > we're obtaining -EPERM when trying to read task->exit_code from > /proc/pid/stat. Hmm. As I read the code exit_code should be returned to userspace as a 0. It does not look to me as if userspace should see an error in that case. > Looking into commit I suspect when mm = NULL we've to move back the test > ptrace_has_cap(__task_cred(task)->user_ns, mode)? Maybe. We might want to consider if these lines from do_task_stat make any sense. if (permitted) seq_put_decimal_ll(m, " ", task->exit_code); else seq_puts(m, " 0"); Looking at the code. Nothing changes behavior except for privileged tasks looking at processes without an mm. So yes it may be sane to tweak that part of the check. AKA in the in for-next branch the code currenty says: mm = task->mm; if (!mm || ((get_dumpable(mm) != SUID_DUMP_USER) && !ptrace_has_cap(mm->user_ns, mode))) return -EPERM; And in the case there is no mm there is no way to get past returning -EPERM. Looking at why we use ptrace_may_access in the exit_code case I see a couple of relevant commits. The commit that added the exit code check: commit f83ce3e6b02d5e48b3a43b001390e2b58820389d Author: Jake Edge <jake@xxxxxxx> Date: Mon May 4 12:51:14 2009 -0600 proc: avoid information leaks to non-privileged processes By using the same test as is used for /proc/pid/maps and /proc/pid/smaps, only allow processes that can ptrace() a given process to see information that might be used to bypass address space layout randomization (ASLR). These include eip, esp, wchan, and start_stack in /proc/pid/stat as well as the non-symbolic output from /proc/pid/wchan. ASLR can be bypassed by sampling eip as shown by the proof-of-concept code at http://code.google.com/p/fuzzyaslr/ As part of a presentation (http://www.cr0.org/paper/to-jt-linux-alsr-leak.pdf) esp and wchan were also noted as possibly usable information leaks as well. The start_stack address also leaks potentially useful information. Cc: Stable Team <stable@xxxxxxxxxx> Signed-off-by: Jake Edge <jake@xxxxxxx> Acked-by: Arjan van de Ven <arjan@xxxxxxxxxxxxxxx> Acked-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> The change that started protecting start_code/end_code and generally using these permissions to protect this class of information: commit 5883f57ca0008ffc93e09cbb9847a1928e50c6f3 Author: Kees Cook <kees.cook@xxxxxxxxxxxxx> Date: Wed Mar 23 16:42:53 2011 -0700 proc: protect mm start_code/end_code in /proc/pid/stat While mm->start_stack was protected from cross-uid viewing (commit f83ce3e6b02d5 ("proc: avoid information leaks to non-privileged processes")), the start_code and end_code values were not. This would allow the text location of a PIE binary to leak, defeating ASLR. Note that the value "1" is used instead of "0" for a protected value since "ps", "killall", and likely other readers of /proc/pid/stat, take start_code of "0" to mean a kernel thread and will misbehave. Thanks to Brad Spengler for pointing this out. Addresses CVE-2011-0726 Signed-off-by: Kees Cook <kees.cook@xxxxxxxxxxxxx> Cc: <stable@xxxxxxxxxx> Cc: Alexey Dobriyan <adobriyan@xxxxxxxxx> Cc: David Howells <dhowells@xxxxxxxxxx> Cc: Eugene Teo <eugeneteo@xxxxxxxxx> Cc: Martin Schwidefsky <schwidefsky@xxxxxxxxxx> Cc: Brad Spengler <spender@xxxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> The commit that added task->exit_code: commit 5b172087f99189416d5f47fd7ab5e6fb762a9ba3 Author: Cyrill Gorcunov <gorcunov@xxxxxxxxxx> Date: Thu May 31 16:26:44 2012 -0700 c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat We would like to have an ability to restore command line arguments and program environment pointers but first we need to obtain them somehow. Thus we put these values into /proc/$pid/stat. The exit_code is needed to restore zombie tasks. Signed-off-by: Cyrill Gorcunov <gorcunov@xxxxxxxxxx> Acked-by: Kees Cook <keescook@xxxxxxxxxxxx> Cc: Pavel Emelyanov <xemul@xxxxxxxxxxxxx> Cc: Serge Hallyn <serge.hallyn@xxxxxxxxxxxxx> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> Cc: Alexey Dobriyan <adobriyan@xxxxxxxxx> Cc: Tejun Heo <tj@xxxxxxxxxx> Cc: Andrew Vagin <avagin@xxxxxxxxxx> Cc: Vasiliy Kulikov <segoon@xxxxxxxxxxxx> Cc: Alexey Dobriyan <adobriyan@xxxxxxxxx> Cc: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Looking at do_task_stat everything else that requires permitted in do_tack_stat is an address. exit_code is something else so I am not at all certain the ptrace_may_access permission check makes sense. A process without an mm is fundamentally undumpable so an error should be returned in any case. So I don't see any harm in failing ptrace_may_access in general. At the same time I can see how not preserving the existing behavior is problematic. So I am probably going to tweak the !mm case so that instead of failing we perform the old capable check in that case. That seems the mot certain way to avoid regressions. With that said, why is exit_code behind a ptrace_may_access permission check? Eric _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers