[cc some overlayfs folks] On Wed, Jun 27, 2018 at 10:48 AM, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > On Mon, 2018-06-18 at 20:11 +0300, Amir Goldstein wrote: >> On Mon, Jun 18, 2018 at 5:56 PM, James Bottomley >> <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: >> [...] >> > > > > - Does not break inotify >> > > > >> > > > I don't expect it does, but I haven't checked. >> > > >> > > I haven't checked either; I'm planning to do so soon. This is a >> > > concern that was expressed to me by others, I think because >> > > inotify doesn't work with overlayfs. >> > >> > I think shiftfs does work simply because it doesn't really do >> > overlays, so lots of stuff that doesn't work with overlays does >> > work with it. >> > >> >> I'm afraid shiftfs suffers from the same problems that the old naiive >> overlayfs inode implementation suffered from. >> >> This problem is demonstrated with LTP tests inotify08 inotify09. >> shiftfs_new_inode() is called on every lookup, so inotify watch >> may be set on an inode object, then dentry is evicted from cache >> and then all events on new dentry are not reported on the watched >> inode. You will need to implement hashed inodes to solve it. >> Can be done as overlay does - hashing by real inode pointer. >> >> This is just one of those subtle things about stacked fs and there >> may be other in present and more in future - if we don't have a >> shared code base for the two stacked fs, I wager you are going to end >> up "cherry picking" fixes often. >> >> IMO, an important question to ask is, since both shiftfs and >> overlayfs are strongly coupled with container use cases, are there >> users that are interested in both layering AND shifting? on the same >> "mark"? If the answer is yes, then this may be an argument in favor >> of integrating at least some of shittfs functionality into overlayfs. > > My container use case is interested in shifting but not layering. Even > the docker use case would only mix the two with the overlay graph > driver. There seem to be quite a few clouds using non overlayfs graph > drivers (the dm one being the most popular). To be clear, I did not mean that shifting will be possible only for overlayfs users, I meant that code base could be shared. One option is what I did with 'snapshot' fs. It creates a new filesystem_type, reuses most of the overlayfs internal structures and many of the overlayfs operations and implements some of its own: https://github.com/amir73il/linux/commit/49081195e1c085b0f02d73f619de5ec0f1113f08 One point of similarity with shiftfs, is that snapshot fs has no lowerdir, only upperdir, e.g.: mount -t snapshot none none -oupperdir=/foo And it only took a single line of change in overlayfs code (in ovl_posix_acl_xattr_set) to support an upper-only configuration. > >> Another argument is that shiftfs itself takes the maximum allowed >> 2 levels of s_stack_depth for it's 2 mounts, so it is actually not >> possible with current VFS limitation to combine shiftfs with >> overlayfs. > > That's an artificial, not an inherent, restriction that was introduced > to keep the call stack small. It can be increased or even eliminated > (although then we'd risk a real run off the end of the kernel stack > problem). > Yeh, only need to get that change passed Al ;-) In an earlier reply I proposed a solution to save 1 stacking layer: mount -oremount,shift To change credentials shifting from 'mark' (=deny all access) to 'shift' (=shift to re-mounter user_ns). >> This could be solved relatively easily by adding "-o mark" support >> to overlayfs and allowing to mount shiftfs also over "marked" >> overlayfs inside container. > > Can we please decided whether the temporary mark, as implemented in the > current patch set or a more permanent security.<something> xattr type > mark is preferred for this? It's an important question that's been > asked, but we have no resolution on. > My 2 cents: it is generally considered a bad idea to let users mess around with overlay upper/work dirs directly, but kernel doesn't do anything to prevent that. It is completely up to the machine admin or container runtime to setup security policies to prevent that. >> Anyway, I'm just playing devil's advocate to the idea of two stacked >> fs implementation, so presenting this point of view. I am fully aware >> that there are also plenty of disadvantages to couple two unrelated >> functionalities together. > > The biggest one seems to be that the points at which overlayfs and > shiftfs do credential shifting are subtly different. That's not to say > they can't be unified, but there's some work to do to prove it's > possible. > That is true. I'd like to point your attention to this patch from Android developers that intends to modify the existing credential shitfting behavior of overlayfs: https://marc.info/?l=linux-unionfs&m=153003238029749&w=2 If we accept a second credential shifting paradigm, its an opening to accept the third one... Thanks, Amir.