Re: [Lsf-pc] [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2015-02-23 at 07:54 -0800, Andy Lutomirski wrote:
> On Sun, Feb 22, 2015 at 9:01 AM, James Bottomley
> <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > On Tue, 2014-12-02 at 15:47 -0800, Andy Lutomirski wrote:
> >> This should hopefully be a short topic, and it's possible that it'll
> >> be settled by the time LSF/MM comes around, but:
> >>
> >> There's a fair amount of interest from different directions for
> >> allowing filesystems with a backing store to be mounted (in the
> >> mount-from-scratch sense, not the bind-mount sense) in a user
> >> namespace.  For example, Seth has patches to allow unprivileged FUSE
> >> mounts.  There are a few issues here, for example:
> >>
> >>  - What happens to device nodes in those filesystems?
> >
> > You have to allow device nodes in mount namespaces.  However, not all
> > devices should be present, only the ones the owner of the namespace is
> > allowed to either see (read only) or control (read/write).
> 
> I agree that you need to allow device nodes, but I'm not sure that you
> need to allow device nodes on filesystems with backing store.  Every
> recent distro should work with devtmpfs (admittedly, we don't know how
> devtmpfs should work in a container), but tmpfs is a decent
> alternative.  In any event, sticking device nodes on ext4 is asking
> for trouble with dynamic minors and such.

OK, so this one is a bit off topic from your original proposal.  Because
now we're moving on to device handling inside containers (which is also
a big can of worms).

We tend to want a strictly controlled /dev for a container, because the
host has to make decisions about hotplug devices and pass them on to
containers (or not) based on its policy.  This makes devtmpfs (to us)
unfit for purpose because all that policy would have to be coded per
container inside the kernel to make it work.  We also need to control
access more strictly because of the disallow write and mount problem.

Device nodes we pass through to the container tend to be done via bind
mount from the host, so most of the policy logic can be in the host
userspace.

In fact, mknod is intercepted from the container and so the host polices
policy from that end as well ... so it doesn't really matter *where* the
device is being created ... that's not to say it couldn't be a tmpfs,
just saying that the actual location isn't that important.  What is
important is policing the node create action.

However, other container people need to chime in here.  I tend to think
that hotplug handling inside the container is unnecessary (certainly in
a hosting/VPS environment), but I believe there are other potential
users of it who have different ideas.

> > The specific problem for container security is allowing the user who can
> > write to the device also to mount it ... because that lets them inject
> > data known to cause a kernel crash and bring down the entire system or
> > worse.  The current solution is simply not to allow the owner both to
> > write and mount, but this is becoming increasingly untenable using
> > loopback images with containers for cascading overlays like docker does.
> 
> I see this as a separate issue.  If the kernel has no implementation
> bugs, this would be a nonissue :)

Right, I started another thread on this.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux