On Wed, Jul 8, 2020 at 12:53 AM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > > On Tue, Jul 07, 2020 at 08:41:20PM +0300, Amir Goldstein wrote: > > > > Miklos, > > > > > > > > At first glance I did not understand how changing lower file handles causes > > > > failure to ovl_verify_inode(). > > > > To complete the picture, here is the explanation. > > > > > > > > Upper file A was copied up from lower file with inode 10 in old squashfs > > > > and the "origin" file handle composed of the inode number 10 is recorded > > > > in upper file A. > > > > > > > > With newly formatted lower, lower A has inode 11 and lower B has inode 10. > > > > Upper file B is copied from lower file B with inode 10 in new squashfs and > > > > the "origin" file handle composed of the inode number 10 is recorded > > > > in upper file B. > > > > Now we have two upper files with the same "origin" that are not hardlinks. > > > > > > > > On lookup of both overlay files A and B, ovl_check_origin() decodes lower > > > > file B (inode 10) as the lower inode. > > > > This lower inode is used to get the overlay inode number (10) and as > > > > the key to hash overlay inode in inode cache. > > > > > > > > Suppose A is looked up first and it's inode is hashed. > > > > Then B is looked up and in ovl_get_inode() it finds the inode hashed > > > > by the same lower inode in inode cache, but fails ovl_verify_inode() > > > > because: > > > > d_inode(upperdentry) /* B */ != ovl_inode_upper(inode) /* A */ > > > > > > > > This can also happen when copying overlay layers to a new > > > > fs tree and carrying over the old "origin" xattr. > > > > In practice, the UUID part of the stored "origin" xattr is meant to > > > > protect against decoding lower fh when migrating to another > > > > filesystem, but layers could be migrated inside the same filesystem. > > > > Since squashfs does not have a UUID, re-creating sqhashfs is similar > > > > to migrating layers inside the same filesystem. > > > > > > Hi Amir, > > > > > > So we can't use "origin" if lower layers have been copied. If they > > > have been copied to a different filesystem with uuid we seem to > > > have a mechanism to detect it but otherwise not and we can run > > > into these kind of issues. > > > > > > > Correct. > > > > > My question is, why do we allow copying or updating lower layers > > > with same upper when we know this will break stored origin in > > > upper. > > > > I don't know if we "allow" this. > > So there seem to two cases. One is copying the lower layers to same > filesystem or a different filesystem. And another case is recreating > the lower layers and use previous upper with old upper. IIUC, we > are currently facing problem with second scenario. > > "copying the lower layers" is probably fine as you said because old > tree still keeps that inode number busy and newly created inode > should not acquire that number. > > But in this case looks like we recreated lower and that led to > some file B acquiring an inode number which was used by A. So > this is a different use case. IIUC, simply copying the layer > will not lead to this situation. > > Now question is do we support recreating the lower layer with existing > upper. And I see following text in "Sharing and copying layer" > > "Mounting an overlay using an upper layer path, where the upper layer path > was previously used by another mounted overlay in combination with a > different lower layer path, is allowed, unless the "inodes index" feature > or "metadata only copy up" feature is enabled." > > This seems to suggest that recreating lower layer is allowed as long > as you are not using index or metacopy feature. > > If that's the case, probably "origin" should have been an opt-in > feature or automatically be enabled by some other opt-in feature. > > Yes. "should have been". That is my point. We thought it was a victimless crime to enable "origin" without opt-in and I think we were wrong. > > We never considered the case expect > > for nfs export and index, see overlayfs.rst: > > "When the overlay NFS export feature is enabled, overlay filesystems > > behavior on offline changes of the underlying lower layer is different > > than the behavior when NFS export is disabled. ..." > > > > > Can't I do same thing with ext4/xfs, where I recreate > > > lower layers when inode numbers get exchanged and same problem > > > will happen (despite uuid being same). > > > > > > > Same problem. > > > > > > IOW, how can we support copying layers (with same upper) while origin > > > is in use. Rest of the features are built on top of the assumption > > > that origin is valid. And in case of copying layers, we don't > > > seem to have a sure way to find if origin is valid or not. > > > > > > > With index/nfs_export enabled we at least do: > > /* Verify lower root is upper root origin */ > > and if verification fails we disable the feature. > > So discrepancy is still possible if somebody modifies lower layers > without changing lower root. > > - Copy up file A. > - Unmount overlay > - unlink A > - create B (assume B gets same inode number as A). > - mount overlay; B gets copied up. > > And now we have both upper A and B having same origin despite no > hardlink. Am I understanding it right? > Yes. > Is there a good use case for allowing modifying lower layers with > same upper. Given we are adding more complex features to overlayfs, > will it make sense to not allow modifying lower layer going forward. > We might not be able to detect it but atleast it will be unsupported > configuration. And then we can only focus to provide a work around > for existing use cases. > Agreed. > [..] > > > > > > > > We were aware of the "layer migration" case when designing the > > > > index/nfs_export feature, which is one of the reasons they are > > > > opt-in features. > > > > > > > > But we enabled the functionality of following non-dir origin > > > > unconditionally because we *thought* it is harmless, as the comment > > > > in ovl_lookup() says: > > > > > > > > /* > > > > * Lookup copy up origin by decoding origin file handle. > > > > * We may get a disconnected dentry, which is fine, > > > > * because we only need to hold the origin inode in > > > > * cache and use its inode number. We may even get a > > > > * connected dentry, that is not under any of the lower > > > > * layers root. That is also fine for using it's inode > > > > * number - it's the same as if we held a reference > > > > * to a dentry in lower layer that was moved under us. > > > > */ > > > > > > > > The patch I posted disabled decoding of non-dir origin for the special > > > > case of lower null uuid. > > > > > > > > I think we can also warn and auto-disable decoding non-dir origin in > > > > case index is disabled and we detect this upper inode conflict in > > > > ovl_verify_inode(). > > > > > > > > The problem is if A is not metacopy and looked up first, and B is > > > > metacopy and looked up second, then conflict will be deleted after > > > > the wrong inode has been hashed. > > We don't allow modifying lower layers if metacopy is enabled. > > "For "metadata only copy up" feature there is no verification mechanism at > mount time. So if same upper is mounted with different set of lower, mount > probably will succeed but expect the unexpected later on. So don't do it. > " > > So if somebody is recreating lower or modifying lower with metacopy > on, its an unsupported configuration. > Very good. So this case is settled. Now, suggestions for work arounds: 1. Don't follow with lower null uuid (patch posted) - no caveats 2. Opt-out of following origin with explicit option e.g. "index=nofollow" 3. Don't follow origin unless one of the following opt-in features: metacopy,index,xino If we go for option #3, the easiest recommendation for distros would be to set: CONFIG_OVERLAY_FS_XINO_AUTO=y Apart from "compatibility with applications that expect 32bit inodes" I am not aware of any caveats to enabling XINO_AUTO. The purpose of xino feature is to make overlay st_ino/st_dev more posix-like (currently only for non samefs), so it makes some sense to tie the "origin" xattr feature that preserves the pre copy up persistent st_ino to xino. The xino documentation does not mention the samefs exemption: "...If this feature is disabled or the underlying filesystem doesn't have enough free bits in the inode number, then overlayfs will not be able to guarantee that the values of st_ino and st_dev returned by stat(2) and the value of d_ino returned by readdir(3) will act like on a normal filesystem." and the 'xino' mount option is currently not displayed with samefs. If we implement option #3, then with samefs user could enable xino if default is off and disable xino if default is auto and display xino in mount options. Thanks, Amir.