> > Miklos, > > > > At first glance I did not understand how changing lower file handles causes > > failure to ovl_verify_inode(). > > To complete the picture, here is the explanation. > > > > Upper file A was copied up from lower file with inode 10 in old squashfs > > and the "origin" file handle composed of the inode number 10 is recorded > > in upper file A. > > > > With newly formatted lower, lower A has inode 11 and lower B has inode 10. > > Upper file B is copied from lower file B with inode 10 in new squashfs and > > the "origin" file handle composed of the inode number 10 is recorded > > in upper file B. > > Now we have two upper files with the same "origin" that are not hardlinks. > > > > On lookup of both overlay files A and B, ovl_check_origin() decodes lower > > file B (inode 10) as the lower inode. > > This lower inode is used to get the overlay inode number (10) and as > > the key to hash overlay inode in inode cache. > > > > Suppose A is looked up first and it's inode is hashed. > > Then B is looked up and in ovl_get_inode() it finds the inode hashed > > by the same lower inode in inode cache, but fails ovl_verify_inode() > > because: > > d_inode(upperdentry) /* B */ != ovl_inode_upper(inode) /* A */ > > > > This can also happen when copying overlay layers to a new > > fs tree and carrying over the old "origin" xattr. > > In practice, the UUID part of the stored "origin" xattr is meant to > > protect against decoding lower fh when migrating to another > > filesystem, but layers could be migrated inside the same filesystem. > > Since squashfs does not have a UUID, re-creating sqhashfs is similar > > to migrating layers inside the same filesystem. > > Hi Amir, > > So we can't use "origin" if lower layers have been copied. If they > have been copied to a different filesystem with uuid we seem to > have a mechanism to detect it but otherwise not and we can run > into these kind of issues. > Correct. > My question is, why do we allow copying or updating lower layers > with same upper when we know this will break stored origin in > upper. I don't know if we "allow" this. We never considered the case expect for nfs export and index, see overlayfs.rst: "When the overlay NFS export feature is enabled, overlay filesystems behavior on offline changes of the underlying lower layer is different than the behavior when NFS export is disabled. ..." > Can't I do same thing with ext4/xfs, where I recreate > lower layers when inode numbers get exchanged and same problem > will happen (despite uuid being same). > Same problem. > IOW, how can we support copying layers (with same upper) while origin > is in use. Rest of the features are built on top of the assumption > that origin is valid. And in case of copying layers, we don't > seem to have a sure way to find if origin is valid or not. > With index/nfs_export enabled we at least do: /* Verify lower root is upper root origin */ and if verification fails we disable the feature. > Should we put a restrictions that if lower layers are updated or > moved then upper should be rotated as lower layer and a new > fresh upper should be used instead? > Don't know and it doesn't matter. Fabian's bug report is for an old setup that used to work. He did not opt-in to any new feature. In kernel v4.12 we added "origin" for persistent inode numbers on same-fs and we didn't make it an opt-in feature. This is when the regression of re-creating lower happened, but back then squashfs (null uuid) did not suffer from the problem. > I might be missing something very fundamental, but before I try > to understand fine details of all the features built on top of > "origin", I fail to understand that how can we allow changing > lower layers and still expect "origin" to be valid and use it. > See the text below. We *thought* that using "origin" for persistent st_ino was safe even if lower layers were copied on the same fs, because we thought If the file handle can be decoded then it guarantees a unique inode number even if it is in the old lower tree and not the new copied tree. For me the reported bug was an oversight, but maybe Miklos has another way of looking at this. Thanks, Amir. > > > > We were aware of the "layer migration" case when designing the > > index/nfs_export feature, which is one of the reasons they are > > opt-in features. > > > > But we enabled the functionality of following non-dir origin > > unconditionally because we *thought* it is harmless, as the comment > > in ovl_lookup() says: > > > > /* > > * Lookup copy up origin by decoding origin file handle. > > * We may get a disconnected dentry, which is fine, > > * because we only need to hold the origin inode in > > * cache and use its inode number. We may even get a > > * connected dentry, that is not under any of the lower > > * layers root. That is also fine for using it's inode > > * number - it's the same as if we held a reference > > * to a dentry in lower layer that was moved under us. > > */ > > > > The patch I posted disabled decoding of non-dir origin for the special > > case of lower null uuid. > > > > I think we can also warn and auto-disable decoding non-dir origin in > > case index is disabled and we detect this upper inode conflict in > > ovl_verify_inode(). > > > > The problem is if A is not metacopy and looked up first, and B is > > metacopy and looked up second, then conflict will be deleted after > > the wrong inode has been hashed. > > > > Perhaps we should disable decoding non-dir origin with in case > > metacopy=on,index=off? > > Maybe also provide a user option to disable decoding non-dir origin > > at the price of losing persistent inode number for copied up non-dir? > > Something like 'index=nofollow'. > > > > Thoughts? > > Am I overthinking this? > > > > Thanks, > > Amir. > > >