Aleksa Sarai <asarai@xxxxxxx> writes: >> Please verify but the ptrace issue that allowed processes in a container >> to call setns on our processes should be fixed as of 4.10-rc1. And the >> change has been marked for backporting. > > ptrace(2) is not the only issue, the issue that we had in runC is that > a process joining a namespace may have file descriptors that refer to > the host filesystem. If the process joining is dumpable, a racing > process inside the container can access those file descriptors through > the /proc/[pid]/fd/... mechanism. > > See CVE-2016-9962. Wow, I said that badly. But the issue is ptrace_attach and the ptrace_may_access checks. >> AKA it should be this fix that removes the need for your dumpable setting. >> bfedb589252c ("mm: Add a user_ns owner to mm_struct and fix ptrace permission checks") > > I will check, though from what I recall that patch doesn't fix the > ptrace_may_access checks. Not to mention it won't help if the > container doesn't have it's own user namespace. That change very much is to the ptrace_may_access checks. You are not playing with setgroups if you don't have your own user namespace. So I don't see how the other cases are relevant. Going further there are only two namespaces that are only 3 pieces of information that I see are relevant in this context. The pid namespace, the user namespace, and the process credentials. You can not join a PID namespace you can only fork into one. The user namespace is relevant in older kernels as joining a user namespace would allow ptrace_has_cap check in ptrace_may_access to pass. After the change listed above that check does not start passing until after exec. The credentials on your process are relevant as changing those allow the ptrace_may_access checks to start passing for other sets of processes. What I think I would do in the situation you describe is to join what you are going to join. Limit yourself to creating pid namespaces with unshare. If you are joining a user namespace set undumpable. If you are creating a user namespace create it and then set undumpable. fork your process into the new pid namespace. change your pocesses credentials. exec whatever you are execing, letting close_on_exec clean up for you. >> Now with that said I believe we want to add the following change now >> that dumpable is user namespace relative. That will use not the >> GLOBAL_ROOT_UID/GID but instead uid and gid 0 in the namespace >> that dumpable is relative too. > > Sure, but that's tangential to the issue under discussion. It is pretty much the only case I see worth considering. Certainly I don't see setgroups as a motivational example. >> But ugh! Your case is even more confused that I had first noticed. >> Saying that a processes is undumpable is completely unnecessary >> when you are entering into a new fresh user namespace. Touching >> setgroups at any point where there are other processes in the namespace >> makes no sense whatsoever. > > Currently in runC the ordering for mixed create-and-join namespaces is > that we first join existing namespaces and _then_ create new ones. So > we need to be non-dumpable to avoid the problem in CVE-2016-9962. Yes mixed create and join is trickier than a pure create and use case, and trickier than a pure join case. But see above it looks quite doable to me to avoid being vulnerable while using the existing permissions even in the case you describe. >> Clearing dumpable is to help not leak things >> into a container when you call setns on a user namespace. > > It is also to help not leak things into a container when you join > other namespaces. Most notably the PID namespace. Except that you don't strictly join a PID namespace. You set a context for children to run in a different PID namespace. So you are safe from PID namespaces as long as you don't call fork. >> + if (mode != (S_IFDIR|S_IRUGO|S_IXUGO)) { > > I'd just like to draw your attention to this special case -- why is > this special cased? What was the original reasoning behind it? Does it > make sense for a non-dumpable process to allow someone to change the > mode of some random /proc/[pid]/ directories? This has nothing at all to do with changing modes and is all about what uid/gid are set on the proc inode. Usually it is the uid/gid of the process in question but occassionally for undumpable processes it is root/root to prevent people from accessing the files in question. > I get the feeling that some of this logic is a bit iffy. It looks like I forgot to carry forward the comment that explains that case in my patch. Something I need to fix before I merge it. /* * Before the /proc/pid/status file was created the only way to read * the effective uid of a /process was to stat /proc/pid. Reading * /proc/pid/status is slow enough that procps and other packages * kept stating /proc/pid. To keep the rules in /proc simple I have * made this apply to all per process world readable and executable * directories. */ Or in short. I broke ps when I removed all of the special cases, and to fix ps I added the existing special case. Not that the uid or gid of a directory that the whole world can access matters. Eric _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers