Re: overlayfs: issue with a replaced lower squashfs with export-table

Amir Goldstein <amir73il@xxxxxxxxx> · Wed, 8 Jul 2020 09:55:32 +0300

On Wed, Jul 8, 2020 at 12:53 AM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
>
> On Tue, Jul 07, 2020 at 08:41:20PM +0300, Amir Goldstein wrote:
> > > > Miklos,
> > > >
> > > > At first glance I did not understand how changing lower file handles causes
> > > > failure to ovl_verify_inode().
> > > > To complete the picture, here is the explanation.
> > > >
> > > > Upper file A was copied up from lower file with inode 10 in old squashfs
> > > > and the "origin" file handle composed of the inode number 10 is recorded
> > > > in upper file A.
> > > >
> > > > With newly formatted lower, lower A has inode 11 and lower B has inode 10.
> > > > Upper file B is copied from lower file B with inode 10 in new squashfs and
> > > > the "origin" file handle composed of the inode number 10 is recorded
> > > > in upper file B.
> > > > Now we have two upper files with the same "origin" that are not hardlinks.
> > > >
> > > > On lookup of both overlay files A and B, ovl_check_origin() decodes lower
> > > > file B (inode 10) as the lower inode.
> > > > This lower inode is used to get the overlay inode number (10) and as
> > > > the key to hash overlay inode in inode cache.
> > > >
> > > > Suppose A is looked up first and it's inode is hashed.
> > > > Then B is looked up and in ovl_get_inode() it finds the inode hashed
> > > > by the same lower inode in inode cache, but fails ovl_verify_inode()
> > > > because:
> > > > d_inode(upperdentry) /* B */ != ovl_inode_upper(inode) /* A */
> > > >
> > > > This can also happen when copying overlay layers to a new
> > > > fs tree and carrying over the old "origin" xattr.
> > > > In practice, the UUID part of the stored "origin" xattr is meant to
> > > > protect against decoding lower fh when migrating to another
> > > > filesystem, but layers could be migrated inside the same filesystem.
> > > > Since squashfs does not have a UUID, re-creating sqhashfs is similar
> > > > to migrating layers inside the same filesystem.
> > >
> > > Hi Amir,
> > >
> > > So we can't use "origin" if lower layers have been copied. If they
> > > have been copied to a different filesystem with uuid we seem to
> > > have a mechanism to detect it but otherwise not and we can run
> > > into these kind of issues.
> > >
> >
> > Correct.
> >
> > > My question is, why do we allow copying or updating lower layers
> > > with same upper when we know this will break stored origin in
> > > upper.
> >
> > I don't know if we "allow" this.
>
> So there seem to two cases. One is copying the lower layers to same
> filesystem or a different filesystem. And another case is recreating
> the lower layers and use previous upper with old upper. IIUC, we
> are currently facing problem with second scenario.
>
> "copying the lower layers" is probably fine as you said because old
> tree still keeps that inode number busy and newly created inode
> should not acquire that number.
>
> But in this case looks like we recreated lower and that led to
> some file B acquiring an inode number which was used by A. So
> this is a different use case. IIUC, simply copying the layer
> will not lead to this situation.
>
> Now question is do we support recreating the lower layer with existing
> upper. And I see following text in "Sharing and copying layer"
>
> "Mounting an overlay using an upper layer path, where the upper layer path
> was previously used by another mounted overlay in combination with a
> different lower layer path, is allowed, unless the "inodes index" feature
> or "metadata only copy up" feature is enabled."
>
> This seems to suggest that recreating lower layer is allowed as long
> as you are not using index or metacopy feature.
>
> If that's the case, probably "origin" should have been an opt-in
> feature or automatically be enabled by some other opt-in feature.
>
>

Yes. "should have been". That is my point.
We thought it was a victimless crime to enable "origin" without opt-in
and I think we were wrong.

> > We never considered the case expect
> > for nfs export and index, see overlayfs.rst:
> > "When the overlay NFS export feature is enabled, overlay filesystems
> > behavior on offline changes of the underlying lower layer is different
> > than the behavior when NFS export is disabled. ..."
> >
> > > Can't I do same thing with ext4/xfs, where I recreate
> > > lower layers when inode numbers get exchanged  and same problem
> > > will happen (despite uuid being same).
> > >
> >
> > Same problem.
>
> >
> > > IOW, how can we support copying layers (with same upper) while origin
> > > is in use. Rest of the features are built on top of the assumption
> > > that origin is valid. And in case of copying layers, we don't
> > > seem to have a sure way to find if origin is valid or not.
> > >
> >
> > With index/nfs_export enabled we at least do:
> > /* Verify lower root is upper root origin */
> > and if verification fails we disable the feature.
>
> So discrepancy is still possible if somebody modifies lower layers
> without changing lower root.
>
> - Copy up file A.
> - Unmount overlay
> - unlink A
> - create B (assume B gets same inode number as A).
> - mount overlay; B gets copied up.
>
> And now we have both upper A and B having same origin despite no
> hardlink. Am I understanding it right?
>

Yes.

> Is there a good use case for allowing modifying lower layers with
> same upper. Given we are adding more complex features to overlayfs,
> will it make sense to not allow modifying lower layer going forward.
> We might not be able to detect it but atleast it will be unsupported
> configuration. And then we can only focus to provide a work around
> for existing use cases.
>

Agreed.

> [..]
> > > >
> > > > We were aware of the "layer migration" case when designing the
> > > > index/nfs_export feature, which is one of the reasons they are
> > > > opt-in features.
> > > >
> > > > But we enabled the functionality of following non-dir origin
> > > > unconditionally because we *thought* it is harmless, as the comment
> > > > in ovl_lookup() says:
> > > >
> > > >          /*
> > > >          * Lookup copy up origin by decoding origin file handle.
> > > >          * We may get a disconnected dentry, which is fine,
> > > >          * because we only need to hold the origin inode in
> > > >          * cache and use its inode number.  We may even get a
> > > >          * connected dentry, that is not under any of the lower
> > > >          * layers root.  That is also fine for using it's inode
> > > >          * number - it's the same as if we held a reference
> > > >          * to a dentry in lower layer that was moved under us.
> > > >          */
> > > >
> > > > The patch I posted disabled decoding of non-dir origin for the special
> > > > case of lower null uuid.
> > > >
> > > > I think we can also warn and auto-disable decoding non-dir origin in
> > > > case index is disabled and we detect this upper inode conflict in
> > > > ovl_verify_inode().
> > > >
> > > > The problem is if A is not metacopy and looked up first, and B is
> > > > metacopy and looked up second, then conflict will be deleted after
> > > > the wrong inode has been hashed.
>
> We don't allow modifying lower layers if metacopy is enabled.
>
> "For "metadata only copy up" feature there is no verification mechanism at
> mount time. So if same upper is mounted with different set of lower, mount
> probably will succeed but expect the unexpected later on. So don't do it.
> "
>
> So if somebody is recreating lower or modifying lower with metacopy
> on, its an unsupported configuration.
>

Very good. So this case is settled.

Now, suggestions for work arounds:

1. Don't follow with lower null uuid (patch posted) - no caveats
2. Opt-out of following origin with explicit option e.g. "index=nofollow"
3. Don't follow origin unless one of the following opt-in features:
    metacopy,index,xino

If we go for option #3, the easiest recommendation for distros
would be to set:
CONFIG_OVERLAY_FS_XINO_AUTO=y

Apart from "compatibility with applications that expect 32bit inodes"
I am not aware of any caveats to enabling XINO_AUTO.

The purpose of xino feature is to make overlay st_ino/st_dev more
posix-like (currently only for non samefs), so it makes some sense
to tie the "origin" xattr feature that preserves the pre copy up
persistent st_ino to xino.

The xino documentation does not mention the samefs exemption:
"...If this feature is disabled or the underlying filesystem doesn't have
enough free bits in the inode number, then overlayfs will not be able to
guarantee that the values of st_ino and st_dev returned by stat(2) and the
value of d_ino returned by readdir(3) will act like on a normal filesystem."

and the 'xino' mount option is currently not displayed with samefs.

If we implement option #3, then with samefs user could enable xino
if default is off and disable xino if default is auto and display xino in
mount options.

Thanks,
Amir.