Re: shiftfs status and future development

"Serge E. Hallyn" <serge@xxxxxxxxxx> · Tue, 3 Jul 2018 11:54:50 -0500

Quoting James Bottomley (James.Bottomley@xxxxxxxxxxxxxxxxxxxxx):
> On Mon, 2018-06-18 at 20:11 +0300, Amir Goldstein wrote:
> > On Mon, Jun 18, 2018 at 5:56 PM, James Bottomley
> > <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > [...]
> > > > > >  - Does not break inotify
> > > > > 
> > > > > I don't expect it does, but I haven't checked.
> > > > 
> > > > I haven't checked either; I'm planning to do so soon. This is a
> > > > concern that was expressed to me by others, I think because
> > > > inotify doesn't work with overlayfs.
> > > 
> > > I think shiftfs does work simply because it doesn't really do
> > > overlays, so lots of stuff that doesn't work with overlays does
> > > work with it.
> > > 
> > 
> > I'm afraid shiftfs suffers from the same problems that the old naiive
> > overlayfs inode implementation suffered from.
> > 
> > This problem is demonstrated with LTP tests inotify08 inotify09.
> > shiftfs_new_inode() is called on every lookup, so inotify watch
> > may be set on an inode object, then dentry is evicted from cache
> > and then all events on new dentry are not reported on the watched
> > inode. You will need to implement hashed inodes to solve it.
> > Can be done as overlay does - hashing by real inode pointer.
> > 
> > This is just one of those subtle things about stacked fs and there
> > may be other in present and more in future - if we don't have a
> > shared code base for the two stacked fs, I wager you are going to end
> > up "cherry picking" fixes often.
> > 
> > IMO, an important question to ask is, since both shiftfs and
> > overlayfs are strongly coupled with container use cases, are there
> > users that are interested in both layering AND shifting? on the same
> > "mark"? If the answer is yes, then this may be an argument in favor
> > of integrating at least some of shittfs functionality into overlayfs.
> 
> My container use case is interested in shifting but not layering.  Even
> the docker use case would only mix the two with the overlay graph
> driver.  There seem to be quite a few clouds using non overlayfs graph
> drivers (the dm one being the most popular).
> 
> > Another argument is that shiftfs itself takes the maximum allowed
> > 2 levels of s_stack_depth for it's 2 mounts, so it is actually not
> > possible with current VFS limitation to combine shiftfs with
> > overlayfs.
> 
> That's an artificial, not an inherent, restriction that was introduced
> to keep the call stack small.  It can be increased or even eliminated
> (although then we'd risk a real run off the end of the kernel stack
> problem).
> 
> > This could be solved relatively easily by adding "-o mark" support
> > to overlayfs and allowing to mount shiftfs also over "marked"
> > overlayfs inside container.
> 
> Can we please decided whether the temporary mark, as implemented in the
> current patch set or a more permanent security.<something> xattr type
> mark is preferred for this?  It's an important question that's been
> asked, but we have no resolution on.

I think something permanent is mandatory.  Otherwise users may be able
to induce a reboot into a state where the temp mark isn't made.  A
security.<something> xattr has the problem that an older kernel may not
know about it.

Two possibilities which have been mentioned before:

1. just demand that the *source* idmap doesn't start at 0.  Ideally it
would be something like 100k uids starting at 100k.  The kernel would
refuse to do a shiftfs mount if the source idmap includes uid 0.

I suppose the "let-them-shoot-themselves-in-the-foot" approach would be
to just strongly recommend using such a source uid mapping, but not
enforce it.

2. Enforce that the base directory have perms 700 for shiftfs-mount to
be allowed.  So /var/lib/shiftfs/base-rootfs might be root-owned with
700 perms.  Root then can shiftfs-mount it to /container1 uid-shifted to
100000:0:100000.  Yes, root could stupidly change the perms later, but
failing that uid 1000 can never exploit a setuid-root script under
/var/lib/shiftfs/base-rootfs

-serge