On Mon, Jul 6, 2020 at 6:14 PM Fabian <godi.beat@xxxxxxx> wrote: > > Hi Amir, > > thanks for your mail and the quick reply! > > Am Montag, 6. Juli 2020, 16:29:51 CEST schrieb Amir Goldstein: > > > We are seeing problems using an read-writeable overlayfs (upper) on a > > > readonly squashfs (lower). The squashfs gets an update from time to time > > > while we keep the upper overlayfs. > > > > It gets updated while the overlay is offline (not mounted) correct? > > Yes. We boot into a recovery system outside the rootfs and its overlayfs, > replace the lower squashfs, and then reboot into the new system. > > > > On replaced files we then see -ESTALE ("overlayfs: failed to get inode > > > (-116)") messages if the lower squashfs was created _without_ using the > > > "-no-exports" switch. > > > The -ESTALE comes from ovl_get_inode() which in turn calls > > > ovl_verify_inode() and returns on the line where the upperdentry inode > > > gets compared > > > ( if (upperdentry && ovl_inode_upper(inode) != d_inode(upperdentry)) ). > > > > > > A little debugging shows, that the upper files dentry name does not fit to > > > the dentry name of the new lower dentry as it seems to look for the inode > > > on the squashfs "export"-lookup-table which has changed as we replaced > > > the lower fs. > > > > > > Building the lower squashfs with the "-no-exports"-mksquashfs option, so > > > without the export-lookup-table, seems to work, but it might be no longer > > > exportable using nfs (which is ok and we can keep with it). Miklos, At first glance I did not understand how changing lower file handles causes failure to ovl_verify_inode(). To complete the picture, here is the explanation. Upper file A was copied up from lower file with inode 10 in old squashfs and the "origin" file handle composed of the inode number 10 is recorded in upper file A. With newly formatted lower, lower A has inode 11 and lower B has inode 10. Upper file B is copied from lower file B with inode 10 in new squashfs and the "origin" file handle composed of the inode number 10 is recorded in upper file B. Now we have two upper files with the same "origin" that are not hardlinks. On lookup of both overlay files A and B, ovl_check_origin() decodes lower file B (inode 10) as the lower inode. This lower inode is used to get the overlay inode number (10) and as the key to hash overlay inode in inode cache. Suppose A is looked up first and it's inode is hashed. Then B is looked up and in ovl_get_inode() it finds the inode hashed by the same lower inode in inode cache, but fails ovl_verify_inode() because: d_inode(upperdentry) /* B */ != ovl_inode_upper(inode) /* A */ This can also happen when copying overlay layers to a new fs tree and carrying over the old "origin" xattr. In practice, the UUID part of the stored "origin" xattr is meant to protect against decoding lower fh when migrating to another filesystem, but layers could be migrated inside the same filesystem. Since squashfs does not have a UUID, re-creating sqhashfs is similar to migrating layers inside the same filesystem. We were aware of the "layer migration" case when designing the index/nfs_export feature, which is one of the reasons they are opt-in features. But we enabled the functionality of following non-dir origin unconditionally because we *thought* it is harmless, as the comment in ovl_lookup() says: /* * Lookup copy up origin by decoding origin file handle. * We may get a disconnected dentry, which is fine, * because we only need to hold the origin inode in * cache and use its inode number. We may even get a * connected dentry, that is not under any of the lower * layers root. That is also fine for using it's inode * number - it's the same as if we held a reference * to a dentry in lower layer that was moved under us. */ The patch I posted disabled decoding of non-dir origin for the special case of lower null uuid. I think we can also warn and auto-disable decoding non-dir origin in case index is disabled and we detect this upper inode conflict in ovl_verify_inode(). The problem is if A is not metacopy and looked up first, and B is metacopy and looked up second, then conflict will be deleted after the wrong inode has been hashed. Perhaps we should disable decoding non-dir origin with in case metacopy=on,index=off? Maybe also provide a user option to disable decoding non-dir origin at the price of losing persistent inode number for copied up non-dir? Something like 'index=nofollow'. Thoughts? Am I overthinking this? Thanks, Amir.