Quoting James Bottomley (James.Bottomley@xxxxxxxxxxxxxxxxxxxxx): > On Mon, 2018-06-18 at 20:11 +0300, Amir Goldstein wrote: > > On Mon, Jun 18, 2018 at 5:56 PM, James Bottomley > > <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > > [...] > > > > > > - Does not break inotify > > > > > > > > > > I don't expect it does, but I haven't checked. > > > > > > > > I haven't checked either; I'm planning to do so soon. This is a > > > > concern that was expressed to me by others, I think because > > > > inotify doesn't work with overlayfs. > > > > > > I think shiftfs does work simply because it doesn't really do > > > overlays, so lots of stuff that doesn't work with overlays does > > > work with it. > > > > > > > I'm afraid shiftfs suffers from the same problems that the old naiive > > overlayfs inode implementation suffered from. > > > > This problem is demonstrated with LTP tests inotify08 inotify09. > > shiftfs_new_inode() is called on every lookup, so inotify watch > > may be set on an inode object, then dentry is evicted from cache > > and then all events on new dentry are not reported on the watched > > inode. You will need to implement hashed inodes to solve it. > > Can be done as overlay does - hashing by real inode pointer. > > > > This is just one of those subtle things about stacked fs and there > > may be other in present and more in future - if we don't have a > > shared code base for the two stacked fs, I wager you are going to end > > up "cherry picking" fixes often. > > > > IMO, an important question to ask is, since both shiftfs and > > overlayfs are strongly coupled with container use cases, are there > > users that are interested in both layering AND shifting? on the same > > "mark"? If the answer is yes, then this may be an argument in favor > > of integrating at least some of shittfs functionality into overlayfs. > > My container use case is interested in shifting but not layering. Even > the docker use case would only mix the two with the overlay graph > driver. There seem to be quite a few clouds using non overlayfs graph > drivers (the dm one being the most popular). > > > Another argument is that shiftfs itself takes the maximum allowed > > 2 levels of s_stack_depth for it's 2 mounts, so it is actually not > > possible with current VFS limitation to combine shiftfs with > > overlayfs. > > That's an artificial, not an inherent, restriction that was introduced > to keep the call stack small. It can be increased or even eliminated > (although then we'd risk a real run off the end of the kernel stack > problem). > > > This could be solved relatively easily by adding "-o mark" support > > to overlayfs and allowing to mount shiftfs also over "marked" > > overlayfs inside container. > > Can we please decided whether the temporary mark, as implemented in the > current patch set or a more permanent security.<something> xattr type > mark is preferred for this? It's an important question that's been > asked, but we have no resolution on. I think something permanent is mandatory. Otherwise users may be able to induce a reboot into a state where the temp mark isn't made. A security.<something> xattr has the problem that an older kernel may not know about it. Two possibilities which have been mentioned before: 1. just demand that the *source* idmap doesn't start at 0. Ideally it would be something like 100k uids starting at 100k. The kernel would refuse to do a shiftfs mount if the source idmap includes uid 0. I suppose the "let-them-shoot-themselves-in-the-foot" approach would be to just strongly recommend using such a source uid mapping, but not enforce it. 2. Enforce that the base directory have perms 700 for shiftfs-mount to be allowed. So /var/lib/shiftfs/base-rootfs might be root-owned with 700 perms. Root then can shiftfs-mount it to /container1 uid-shifted to 100000:0:100000. Yes, root could stupidly change the perms later, but failing that uid 1000 can never exploit a setuid-root script under /var/lib/shiftfs/base-rootfs -serge