Re: [Lsf-pc] [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Sun, 22 Feb 2015 09:12:35 -0800

On Tue, 2014-12-02 at 21:37 -0600, Eric W. Biederman wrote:
> Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes:
> 
> > This should hopefully be a short topic, and it's possible that it'll
> > be settled by the time LSF/MM comes around, but:
> >
> > There's a fair amount of interest from different directions for
> > allowing filesystems with a backing store to be mounted (in the
> > mount-from-scratch sense, not the bind-mount sense) in a user
> > namespace.  For example, Seth has patches to allow unprivileged FUSE
> > mounts.  There are a few issues here, for example:
> >
> >  - What happens to device nodes in those filesystems?
> >
> >  - If a FUSE backend is in a user namespace, how should UIDs be
> > translated to/from that backend?
> >
> >  - How should LSM security labels be translated?
> >
> >  - Should a struct super_block be associated with a user namespace?
> > (Answer: probably, I think.)  If so, what should the semantics be?
> >
> > There are also some remapping cases that aren't directly user
> > namespace-related.  For example, I'd like to be able to insert
> > removable media and create files owned by uid 0 (or any other uid)
> > without actually being root.
> 
> And there is the longer term question that may be more appropriate when
> we get all of the id problems settled, about what kind of
> testing, auditing, review we want in place before we believe an
> unprivileged mount is actually safe to perform, when we can assume
> hostile intent by the mounter.

Realistically, we can't rely on auditing the data: a hostile user will
be injecting a specific data pattern to exploit a bug in the filesystem
code.  We can't audit for this if we don't know the bug (which we mostly
don't otherwise they'd be fixed).

What we can do is audit for specific operations.  Looking at what the
use cases are, users mostly either want to create a pristine filesystem
or use an existing template.  Mkfs is a particular nasty because it's
all in userspace and sprays data down on to the device making it really
hard to audit.  One of the approaches we've experimented with in
Parallels is the bit bucket one, where we create a device that looks
read/write in the container, but really it throws away the writes from
the user and performs in the host the operation we believe the user is
trying to do.  It protects against most injection attacks, but trips up
when the user tries to do some operation we haven't anticipated.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html