On 01/24/2012 04:04 AM, Eric W. Biederman wrote:
Glauber Costa<glommer@xxxxxxxxxxxxx> writes:
This patch creates a list of allowed filesystems per-namespace.
The goal is to prevent users inside a container, even root,
to mount filesystems that are not allowed by the main box admin.
My main two motivators to pursue this are:
1) We want to prevent a certain tailored view of some virtual
filesystems, for example, by bind-mounting files with userspace
generated data into /proc. The ability of mounting /proc inside
the container works against this effort, while disallowing it
via capabilities would have the effect of disallowing other
mounts as well.
2) Some filesystems are known not to behave well under a container
environment. They require changes to work in a safe-way. We can
whitelist only the filesystems we want.
This works as a whitelist. Only filesystems in the list are allowed
to be mounted. Doing a blacklist would create problems when, say,
a module is loaded. The whitelist is only checked if it is enabled first.
So any setup that was already working, will keep working. And whoever
is not interested in limiting filesystem mount, does not need
to bother about it.
My first impression is that this looks like a hack to avoid finishing
the user namespace.
This is a terrible way to go about implementing unprivileged mounts.
If there are technical reasons why it is unsafe to mount filesystems
that we need to whitelist/blacklist filesystems in the kernel where we
can check things.
Why in the world would anyone want the ability to not mount a specific
filesystem type?
See my reply to Al. So again, to avoid steering the discussions to
details I myself don't consider central (since this is a first post
anyway), let's focus on the /proc container case. It is a privileged
user as far as the container goes, and we'd like to allow it to mount
filesystems. But disallowing it to mount /proc, can guarantee that the
user will be provided with a version of /proc that is safe, and that he
can't escape this.
Ideally, userspace wouldn't even get involved with this, and a process
mounting /proc would see the right things, depending on where it came
from. But turns out that the cgroups-controlled resources are a lot
harder than the namespaces-controlled resources for this.
Using netlink as an interface when you are talking filesystems to
filesystem is pretty horrid. Netlink is great for networking developers
they get networking, but filesystem people understand filesystems and
you want to use netlink?
Well, I am not doing it for filesystem people, but for people who are
neither, aka,
whoever wants to use this interface. But that said, I don't want to keep
the discussion around this. My main reason was to have a quick way to
communicate this list to the kernel, so I could test it, and post a PoC
for you guys to comment on. Even if everybody liked it, I was prepared
from the start to redesign the interface.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html