Re: [Lsf-pc] [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Mon, 2 Mar 2015 14:34:50 -0800

On Mon, Feb 23, 2015 at 8:16 AM, James Bottomley
<James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, 2015-02-23 at 07:54 -0800, Andy Lutomirski wrote:
>> On Sun, Feb 22, 2015 at 9:01 AM, James Bottomley
>> <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
>> > On Tue, 2014-12-02 at 15:47 -0800, Andy Lutomirski wrote:
>> >> This should hopefully be a short topic, and it's possible that it'll
>> >> be settled by the time LSF/MM comes around, but:
>> >>
>> >> There's a fair amount of interest from different directions for
>> >> allowing filesystems with a backing store to be mounted (in the
>> >> mount-from-scratch sense, not the bind-mount sense) in a user
>> >> namespace.  For example, Seth has patches to allow unprivileged FUSE
>> >> mounts.  There are a few issues here, for example:
>> >>
>> >>  - What happens to device nodes in those filesystems?
>> >
>> > You have to allow device nodes in mount namespaces.  However, not all
>> > devices should be present, only the ones the owner of the namespace is
>> > allowed to either see (read only) or control (read/write).
>>
>> I agree that you need to allow device nodes, but I'm not sure that you
>> need to allow device nodes on filesystems with backing store.  Every
>> recent distro should work with devtmpfs (admittedly, we don't know how
>> devtmpfs should work in a container), but tmpfs is a decent
>> alternative.  In any event, sticking device nodes on ext4 is asking
>> for trouble with dynamic minors and such.
>
> OK, so this one is a bit off topic from your original proposal.  Because
> now we're moving on to device handling inside containers (which is also
> a big can of worms).
>
> We tend to want a strictly controlled /dev for a container, because the
> host has to make decisions about hotplug devices and pass them on to
> containers (or not) based on its policy.  This makes devtmpfs (to us)
> unfit for purpose because all that policy would have to be coded per
> container inside the kernel to make it work.  We also need to control
> access more strictly because of the disallow write and mount problem.
>
> Device nodes we pass through to the container tend to be done via bind
> mount from the host, so most of the policy logic can be in the host
> userspace.
>
> In fact, mknod is intercepted from the container and so the host polices
> policy from that end as well ... so it doesn't really matter *where* the
> device is being created ... that's not to say it couldn't be a tmpfs,
> just saying that the actual location isn't that important.  What is
> important is policing the node create action.

Agreed, as long as the fs with the device nodes isn't ext4 or some
other real fs backed by storage owned by the container (obviously).

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html