James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> writes: > On Fri, 2016-07-08 at 18:52 -0500, Eric W. Biederman wrote: >> James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> writes: >> >> > On July 8, 2016 1:38:19 PM PDT, Andrew Vagin <avagin@xxxxxxxxxxxxx> >> > wrote: >> >> > > What do you think about the idea to mount nsfs and be able to >> > > look up any alive namespace by inum: >> > >> > I think I like it. It will give us a way to enter any extant >> > namespace. It will work for Eric's fs namespaces as well. Perhaps >> > a /process/ns/<inum> Directory? > > As you understood, I meant /proc/ns/<inum> (damn mobile phone > completions). > >> *Shivers* >> >> That makes it very easy to bypass any existing controls that exist >> for getting at namespaces. It is true that everything of that kind >> is directory based but still. >> >> Plus I think it would serve as information leak to information >> outside of the container. >> >> An operation to get a user namespace file descriptor from some kernel >> object sounds reasonably sane. >> >> A great big list of things sounds about as scary as it can get. This >> is not the time to be making it easier to escape from containers. > > To be honest, I think this argument is rubbish. If we're afraid of > giving out a list of all the namespaces, it means we're afraid there's > some security bug and we're trying to obscure it by making the list > hard to get. All we've done is allayed fears about the bug but the > hackers still know the portals to get through. > > If such a bug exists, it will be possible to exploit it by simply > reconstructing the information from the individual process directories, > so obscurity doesn't protect us and all it does is give us a false > sense of security. If such a bug doesn't exist, then all the security > mechanisms currently in place (like no re-entry to prior namespace) > should protect us and we can give out the list. > > Let's deal with the world as we'd like it to be (no obscure namespace > bugs) and accept the consequences and the responsibility for fixing > them if we turn out to be slightly incorrect. We'll end up in a far > better place than security by obscurity would land us. No. That is not the fear. The permission checks on /proc/self/ns/xxx are different than if the namespace is bind mounted somewhere. That was done deliberately and with a reasonable amount of forethought. You are asking to throw those permission checks out. The answer is no. Furthermore there is a much clearer reason not to go with a list of all namespaces. A list of all namespaces breaks CRIU. As you have described it the list will change depending upon which machine you restore a checkpoint on. I honestly don't know what kind of havoc that will cause but it is certainly something we won't be able to checkpoint no matter how hard we try. A global list of namespaces especially of the kind that you can open and get a handle to the namespace is just not appropriate. I know inode numbers comes darn close to names but they aren't really names and if it comes to it we can figure out how to preserve an applications view of it all across a checkpoint/restart. So far it hasn't proven necessary to preserve any inode numbers across checkpoint/restart but again it is theoretically possible if it becomes necessary. Throwing away checkpoint/restart support for the sake of checkpoint/restart is a no-go. Containers fundamentally imply you don't have global visibility, and that is a good thing. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html