Re: shiftfs status and future development

Amir Goldstein <amir73il@xxxxxxxxx> · Wed, 27 Jun 2018 13:17:40 +0300

[cc some overlayfs folks]

On Wed, Jun 27, 2018 at 10:48 AM, James Bottomley
<James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, 2018-06-18 at 20:11 +0300, Amir Goldstein wrote:
>> On Mon, Jun 18, 2018 at 5:56 PM, James Bottomley
>> <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
>> [...]
>> > > > >  - Does not break inotify
>> > > >
>> > > > I don't expect it does, but I haven't checked.
>> > >
>> > > I haven't checked either; I'm planning to do so soon. This is a
>> > > concern that was expressed to me by others, I think because
>> > > inotify doesn't work with overlayfs.
>> >
>> > I think shiftfs does work simply because it doesn't really do
>> > overlays, so lots of stuff that doesn't work with overlays does
>> > work with it.
>> >
>>
>> I'm afraid shiftfs suffers from the same problems that the old naiive
>> overlayfs inode implementation suffered from.
>>
>> This problem is demonstrated with LTP tests inotify08 inotify09.
>> shiftfs_new_inode() is called on every lookup, so inotify watch
>> may be set on an inode object, then dentry is evicted from cache
>> and then all events on new dentry are not reported on the watched
>> inode. You will need to implement hashed inodes to solve it.
>> Can be done as overlay does - hashing by real inode pointer.
>>
>> This is just one of those subtle things about stacked fs and there
>> may be other in present and more in future - if we don't have a
>> shared code base for the two stacked fs, I wager you are going to end
>> up "cherry picking" fixes often.
>>
>> IMO, an important question to ask is, since both shiftfs and
>> overlayfs are strongly coupled with container use cases, are there
>> users that are interested in both layering AND shifting? on the same
>> "mark"? If the answer is yes, then this may be an argument in favor
>> of integrating at least some of shittfs functionality into overlayfs.
>
> My container use case is interested in shifting but not layering.  Even
> the docker use case would only mix the two with the overlay graph
> driver.  There seem to be quite a few clouds using non overlayfs graph
> drivers (the dm one being the most popular).

To be clear, I did not mean that shifting will be possible only for
overlayfs users, I meant that code base could be shared.

One option is what I did with 'snapshot' fs.
It creates a new filesystem_type, reuses most of the overlayfs
internal structures and many of the overlayfs operations and
implements some of its own:
https://github.com/amir73il/linux/commit/49081195e1c085b0f02d73f619de5ec0f1113f08

One point of similarity with shiftfs, is that snapshot fs
has no lowerdir, only upperdir, e.g.:
   mount -t snapshot none none -oupperdir=/foo

And it only took a single line of change in overlayfs code
(in ovl_posix_acl_xattr_set) to support an upper-only configuration.

>
>> Another argument is that shiftfs itself takes the maximum allowed
>> 2 levels of s_stack_depth for it's 2 mounts, so it is actually not
>> possible with current VFS limitation to combine shiftfs with
>> overlayfs.
>
> That's an artificial, not an inherent, restriction that was introduced
> to keep the call stack small.  It can be increased or even eliminated
> (although then we'd risk a real run off the end of the kernel stack
> problem).
>

Yeh, only need to get that change passed Al ;-)

In an earlier reply I proposed a solution to save 1 stacking layer:
   mount -oremount,shift

To change credentials shifting from 'mark' (=deny all access)
to 'shift' (=shift to re-mounter user_ns).

>> This could be solved relatively easily by adding "-o mark" support
>> to overlayfs and allowing to mount shiftfs also over "marked"
>> overlayfs inside container.
>
> Can we please decided whether the temporary mark, as implemented in the
> current patch set or a more permanent security.<something> xattr type
> mark is preferred for this?  It's an important question that's been
> asked, but we have no resolution on.
>

My 2 cents: it is generally considered a bad idea to let users
mess around with overlay upper/work dirs directly, but kernel doesn't do
anything to prevent that. It is completely up to the machine
admin or container runtime to setup security policies to prevent that.

>> Anyway, I'm just playing devil's advocate to the idea of two stacked
>> fs implementation, so presenting this point of view. I am fully aware
>> that there are also plenty of disadvantages to couple two unrelated
>> functionalities together.
>
> The biggest one seems to be that the points at which overlayfs and
> shiftfs do credential shifting are subtly different.  That's not to say
> they can't be unified, but there's some work to do to prove it's
> possible.
>

That is true. I'd like to point your attention to this patch from
Android developers that intends to modify the existing credential
shitfting behavior of overlayfs:
https://marc.info/?l=linux-unionfs&m=153003238029749&w=2

If we accept a second credential shifting paradigm, its an opening
to accept the third one...

Thanks,
Amir.