Re: [Lsf-pc] [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping

Jan Kara <jack@xxxxxxx> · Mon, 23 Feb 2015 13:38:19 +0100

On Sun 22-02-15 09:12:35, James Bottomley wrote:
> On Tue, 2014-12-02 at 21:37 -0600, Eric W. Biederman wrote:
> > Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes:
> > 
> > > This should hopefully be a short topic, and it's possible that it'll
> > > be settled by the time LSF/MM comes around, but:
> > >
> > > There's a fair amount of interest from different directions for
> > > allowing filesystems with a backing store to be mounted (in the
> > > mount-from-scratch sense, not the bind-mount sense) in a user
> > > namespace.  For example, Seth has patches to allow unprivileged FUSE
> > > mounts.  There are a few issues here, for example:
> > >
> > >  - What happens to device nodes in those filesystems?
> > >
> > >  - If a FUSE backend is in a user namespace, how should UIDs be
> > > translated to/from that backend?
> > >
> > >  - How should LSM security labels be translated?
> > >
> > >  - Should a struct super_block be associated with a user namespace?
> > > (Answer: probably, I think.)  If so, what should the semantics be?
> > >
> > > There are also some remapping cases that aren't directly user
> > > namespace-related.  For example, I'd like to be able to insert
> > > removable media and create files owned by uid 0 (or any other uid)
> > > without actually being root.
> > 
> > And there is the longer term question that may be more appropriate when
> > we get all of the id problems settled, about what kind of
> > testing, auditing, review we want in place before we believe an
> > unprivileged mount is actually safe to perform, when we can assume
> > hostile intent by the mounter.
> 
> Realistically, we can't rely on auditing the data: a hostile user will
> be injecting a specific data pattern to exploit a bug in the filesystem
> code.  We can't audit for this if we don't know the bug (which we mostly
> don't otherwise they'd be fixed).
> 
> What we can do is audit for specific operations.  Looking at what the
> use cases are, users mostly either want to create a pristine filesystem
> or use an existing template.  Mkfs is a particular nasty because it's
  Well, what if you also had templates for pristine filesystems? There
aren't that many sensible configs and compressed empty fs image is pretty
small... Sure, users won't be able to "finetune" their fs configuration but
is it that important? Most users don't do that anyway.

Alternatively you could just forbid writing from the container and if user
wants to create fs image, he'd just pass options for mkfs to some service
which will run mkfs outside of the container. It isn't neat but when I see
the hacks you are describing below, it doesn't seem as such a bad option :)

> all in userspace and sprays data down on to the device making it really
> hard to audit.  One of the approaches we've experimented with in
> Parallels is the bit bucket one, where we create a device that looks
> read/write in the container, but really it throws away the writes from
> the user and performs in the host the operation we believe the user is
> trying to do.  It protects against most injection attacks, but trips up
> when the user tries to do some operation we haven't anticipated.

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html