On Mon, Jul 6, 2020 at 7:10 PM Fabian <godi.beat@xxxxxxx> wrote: > > Hi Amir, > > Am Montag, 6. Juli 2020, 17:33:54 CEST schrieb Amir Goldstein: > > On Mon, Jul 6, 2020 at 6:14 PM Fabian <godi.beat@xxxxxxx> wrote: > > > Hi Amir, > > > > > > thanks for your mail and the quick reply! > > > > > > Am Montag, 6. Juli 2020, 16:29:51 CEST schrieb Amir Goldstein: > > > > > We are seeing problems using an read-writeable overlayfs (upper) on a > > > > > readonly squashfs (lower). The squashfs gets an update from time to > > > > > time > > > > > while we keep the upper overlayfs. > > > > > > > > It gets updated while the overlay is offline (not mounted) correct? > > > > > > Yes. We boot into a recovery system outside the rootfs and its overlayfs, > > > replace the lower squashfs, and then reboot into the new system. > > > > > > > > On replaced files we then see -ESTALE ("overlayfs: failed to get inode > > > > > (-116)") messages if the lower squashfs was created _without_ using > > > > > the > > > > > "-no-exports" switch. > > > > > The -ESTALE comes from ovl_get_inode() which in turn calls > > > > > ovl_verify_inode() and returns on the line where the upperdentry inode > > > > > gets compared > > > > > ( if (upperdentry && ovl_inode_upper(inode) != d_inode(upperdentry)) > > > > > ). > > > > > > > > > > A little debugging shows, that the upper files dentry name does not > > > > > fit to > > > > > the dentry name of the new lower dentry as it seems to look for the > > > > > inode > > > > > on the squashfs "export"-lookup-table which has changed as we replaced > > > > > the lower fs. > > > > > > > > > > Building the lower squashfs with the "-no-exports"-mksquashfs option, > > > > > so > > > > > without the export-lookup-table, seems to work, but it might be no > > > > > longer > > > > > exportable using nfs (which is ok and we can keep with it). > > > > > > > > > > As we didn't find any other information regarding this behaviour or > > > > > anyone > > > > > who also had this problem before we just want to know if this is the > > > > > right way to use the rw overlayfs on a (replaceable) ro squashfs > > > > > filesystem. > > > > > > > > > > Is this a known issue? Is it really needed to disable the export > > > > > feature > > > > > when using overlayfs on a squashfs if we later need to replace > > > > > squashfs > > > > > during an update? Any hints we can have a look on if this should work > > > > > and > > > > > we might have done wrong during squashfs or overlayfs creation? > > > > > > > > This sounds like an unintentional outcome of: > > > > 9df085f3c9a2 ovl: relax requirement for non null uuid of lower fs > > > > > > > > Which enabled nfs_export for overlay with lower squashfs. > > > > > > > > If you do not need to export overlayfs to NFS, then you can check if the > > > > attached patch solves your problem. > > > > > > With the attached patch i'm now getting to a point where the overlayfs > > > tries to handle the /run-directory (a symlink). There seems to be a > > > -ESTALE at ovl_check_origin_fh() after the for-loop where it checks if > > > origin was not found ( if (!origin) ). Maybe i should debug for more > > > details here? Please let me know. > > > > This is expected. Does it cause any problem? > > > > The patch marks the lower squashfs as "bad_uuid", because: > > if (!ofs->config.index && uuid_is_null(uuid)) > > return false; > > ... > > if (!ovl_lower_uuid_ok(ofs, &sb->s_uuid)) { > > bad_uuid = true; > > ... > > ofs->fs[ofs->numfs].bad_uuid = bad_uuid; > > > > That's ofs->fs[1].bad_uuid = bad_uuid; > > > > > > Then in ovl_lookup() => ovl_check_origin() => ovl_check_origin_fh() > > will return ESALE because of: > > if (ofs->layers[i].fsid && > > ofs->layers[i].fs->bad_uuid) > > continue; > > > > And ovl_check_origin() will return 0 to ovl_lookup(). > > I'm sorry. You are totaly right! RootFS now completely comes up - just missed > the console start in our latest inittab - so thought something still hangs. > The ESTALE was printed for me because i debugged the whole ESTALE positions in > the overlayfs code while studying the first problem. Time to remove my debug > code... > > We will now continue with update tests. If we see something else i will let > you know. > > OK. please report back when done testing so I can add your tested-by Thanks, Amir.