Re: overlayfs: issue with a replaced lower squashfs with export-table

Vivek Goyal <vgoyal@xxxxxxxxxx> · Tue, 7 Jul 2020 11:51:59 -0400

On Tue, Jul 07, 2020 at 08:51:53AM +0300, Amir Goldstein wrote:
> On Mon, Jul 6, 2020 at 6:14 PM Fabian <godi.beat@xxxxxxx> wrote:
> >
> > Hi Amir,
> >
> > thanks for your mail and the quick reply!
> >
> > Am Montag, 6. Juli 2020, 16:29:51 CEST schrieb Amir Goldstein:
> > > > We are seeing problems using an read-writeable overlayfs (upper) on a
> > > > readonly squashfs (lower). The squashfs gets an update from time to time
> > > > while we keep the upper overlayfs.
> > >
> > > It gets updated while the overlay is offline (not mounted) correct?
> >
> > Yes. We boot into a recovery system outside the rootfs and its overlayfs,
> > replace the lower squashfs, and then reboot into the new system.
> >
> > > > On replaced files we then see -ESTALE ("overlayfs: failed to get inode
> > > > (-116)") messages if the lower squashfs was created _without_ using the
> > > > "-no-exports" switch.
> > > > The -ESTALE comes from ovl_get_inode() which in turn calls
> > > > ovl_verify_inode() and returns on the line where the upperdentry inode
> > > > gets compared
> > > > ( if (upperdentry && ovl_inode_upper(inode) != d_inode(upperdentry)) ).
> > > >
> > > > A little debugging shows, that the upper files dentry name does not fit to
> > > > the dentry name of the new lower dentry as it seems to look for the inode
> > > > on the squashfs "export"-lookup-table which has changed as we replaced
> > > > the lower fs.
> > > >
> > > > Building the lower squashfs with the "-no-exports"-mksquashfs option, so
> > > > without the export-lookup-table, seems to work, but it might be no longer
> > > > exportable using nfs (which is ok and we can keep with it).
> 
> Miklos,
> 
> At first glance I did not understand how changing lower file handles causes
> failure to ovl_verify_inode().
> To complete the picture, here is the explanation.
> 
> Upper file A was copied up from lower file with inode 10 in old squashfs
> and the "origin" file handle composed of the inode number 10 is recorded
> in upper file A.
> 
> With newly formatted lower, lower A has inode 11 and lower B has inode 10.
> Upper file B is copied from lower file B with inode 10 in new squashfs and
> the "origin" file handle composed of the inode number 10 is recorded
> in upper file B.
> Now we have two upper files with the same "origin" that are not hardlinks.
> 
> On lookup of both overlay files A and B, ovl_check_origin() decodes lower
> file B (inode 10) as the lower inode.
> This lower inode is used to get the overlay inode number (10) and as
> the key to hash overlay inode in inode cache.
> 
> Suppose A is looked up first and it's inode is hashed.
> Then B is looked up and in ovl_get_inode() it finds the inode hashed
> by the same lower inode in inode cache, but fails ovl_verify_inode()
> because:
> d_inode(upperdentry) /* B */ != ovl_inode_upper(inode) /* A */
> 
> This can also happen when copying overlay layers to a new
> fs tree and carrying over the old "origin" xattr.
> In practice, the UUID part of the stored "origin" xattr is meant to
> protect against decoding lower fh when migrating to another
> filesystem, but layers could be migrated inside the same filesystem.
> Since squashfs does not have a UUID, re-creating sqhashfs is similar
> to migrating layers inside the same filesystem.

Hi Amir,

So we can't use "origin" if lower layers have been copied. If they
have been copied to a different filesystem with uuid we seem to
have a mechanism to detect it but otherwise not and we can run
into these kind of issues.

My question is, why do we allow copying or updating lower layers
with same upper when we know this will break stored origin in
upper. Can't I do same thing with ext4/xfs, where I recreate
lower layers when inode numbers get exchanged  and same problem
will happen (despite uuid being same). 

IOW, how can we support copying layers (with same upper) while origin
is in use. Rest of the features are built on top of the assumption
that origin is valid. And in case of copying layers, we don't
seem to have a sure way to find if origin is valid or not.

Should we put a restrictions that if lower layers are updated or
moved then upper should be rotated as lower layer and a new
fresh upper should be used instead?

I might be missing something very fundamental, but before I try
to understand fine details of all the features built on top of
"origin", I fail to understand that how can we allow changing
lower layers and still expect "origin" to be valid and use it.

Thanks
Vivek
> 
> We were aware of the "layer migration" case when designing the
> index/nfs_export feature, which is one of the reasons they are
> opt-in features.
> 
> But we enabled the functionality of following non-dir origin
> unconditionally because we *thought* it is harmless, as the comment
> in ovl_lookup() says:
> 
>          /*
>          * Lookup copy up origin by decoding origin file handle.
>          * We may get a disconnected dentry, which is fine,
>          * because we only need to hold the origin inode in
>          * cache and use its inode number.  We may even get a
>          * connected dentry, that is not under any of the lower
>          * layers root.  That is also fine for using it's inode
>          * number - it's the same as if we held a reference
>          * to a dentry in lower layer that was moved under us.
>          */
> 
> The patch I posted disabled decoding of non-dir origin for the special
> case of lower null uuid.
> 
> I think we can also warn and auto-disable decoding non-dir origin in
> case index is disabled and we detect this upper inode conflict in
> ovl_verify_inode().
> 
> The problem is if A is not metacopy and looked up first, and B is
> metacopy and looked up second, then conflict will be deleted after
> the wrong inode has been hashed.
> 
> Perhaps we should disable decoding non-dir origin with in case
> metacopy=on,index=off?
> Maybe also provide a user option to disable decoding non-dir origin
> at the price of losing persistent inode number for copied up non-dir?
> Something like 'index=nofollow'.
> 
> Thoughts?
> Am I overthinking this?
> 
> Thanks,
> Amir.
>