Re: [RFC 0/4] per-namespace allowed filesystems list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/24/2012 01:12 AM, Al Viro wrote:
On Mon, Jan 23, 2012 at 08:56:08PM +0400, Glauber Costa wrote:
This patch creates a list of allowed filesystems per-namespace.
The goal is to prevent users inside a container, even root,
to mount filesystems that are not allowed by the main box admin.

My main two motivators to pursue this are:
  1) We want to prevent a certain tailored view of some virtual
     filesystems, for example, by bind-mounting files with userspace
     generated data into /proc. The ability of mounting /proc inside
     the container works against this effort, while disallowing it
     via capabilities would have the effect of disallowing other
     mounts as well.

Translation, please.

2) Some filesystems are known not to behave well under a container
    environment. They require changes to work in a safe-way. We can
    whitelist only the filesystems we want.

So fix them.

This works as a whitelist. Only filesystems in the list are allowed
to be mounted. Doing a blacklist would create problems when, say,
a module is loaded. The whitelist is only checked if it is enabled first.
So any setup that was already working, will keep working. And whoever
is not interested in limiting filesystem mount, does not need
to bother about it.

Please let me know what you guys think about it.

NAKed-by: Al Viro<viro@xxxxxxxxxxxxxxxxxx>
NAKed-because: too fucking ugly

This is bloody ridiculous; if you want to prevent a luser adming playing with
the set of mounts you've given it, the right way to go is not to mess with the
"which fs types are allowed" but to add a per-namespace "immutable" flag.
And add a new clone(2)/unshare(2) flag, used only along with the CLONE_NEWNS
and setting the "immutable" on the copied namespace.

Okay, not that I laid down the problem, I am happy to pursue any solutions we think is better. But let me develop it a bit more, first.

An immutable flag does not work, because I don't want to prevent a luser (loved that) to mess up with the mounts they are given. In general, it is perfectly fine for them to mount things inside the cointainer as the time goes.

But some others, I don't consider so. The example of /proc I've given, let me elaborate: Much of the information living on /proc, is really global, rather than per-container. The ones pertaining to pid namespace, and other namespaces are already per-namespace so they are fine. But there is more: some of the things /proc track, like cpu usage, memory, and the like, are resource-constrained by other entities, for instance, cgroups. In some cases, like /proc/stat, information exists in cgroup, but come from more than once cgroup. All of them are independent in nature, making it hard to come out with a
coherent vision.

Furthermore, there is no connection between namespaces and cgroups, so it is not obvious at all (there were discussions before), which information should the process see - unlike namespaces, the mere fact that a process lives in a cgroup, does not really mean it is isolated from the system in this sense.

One of the solutions, is to do it all in userspace, from outside the container, and bind mount the files inside the container's /proc. But it only works if we can prevent the user from remounting the real /proc somewhere. Not because it would screw up his system, which I don't care about, but because it will give him information about the global state of the system.

An immutable flag fixes this, but then it prevents all further legitimate mounts
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux