Re: [PATCH v2 00/13] Overlayfs lazy lookup of lowerdata

Amir Goldstein <amir73il@xxxxxxxxx> · Fri, 26 May 2023 08:12:29 +0300

On Thu, May 25, 2023 at 7:59 PM Giuseppe Scrivano <gscrivan@xxxxxxxxxx> wrote:
>
> Hi Amir,
>
> Amir Goldstein <amir73il@xxxxxxxxx> writes:
>
> > On Thu, May 25, 2023 at 6:21 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote:
> >>
> >> Something that came up about this in a discussion recently was
> >> multi-layer composefs style images. For example, this may be a useful
> >> approach for multi-layer container images.
> >>
> >> In such a setup you would have one lowerdata layer, but two real
> >> lowerdirs, like lowerdir=A:B::C. In this situation a file in B may
> >> accidentally have the same name as a file on C, causing a redirect
> >> from A to end up in B instead of C.
> >>
> >
> > I was under the impression that the names of the data blobs in C
> > are supposed to be content derived names (hash).
> > Is this not the case or is the concern about hash conflicts?
> >
> >> Would it be possible to have a syntax for redirects that mean "only
> >> lookup in lowerdata layers. For example a double-slash path
> >> //some/file.
> >>
> >
> > Anything is possible if we can define the problem that needs to be solved.
> > In this case, I did not understand why the problem is limited to finding a file
> > by mistake in layer B.
> >
> > If there are several data layers A:B::C:D why wouldn't we have the same
> > problem with a file name collision between C and D?
>
> the data layer is constructed in a way that files are stored by their
> hash and there is control from the container runtime on how this is
> built and maintained.  So a file name collision would happen only when
> on a hash collision.
>
> Differently for the other layers we've no control on what files are in
> the image, unless we limit to mount only one EROFS as the first lower
> layer and then all the other lower layers are data layers.
>
> Given your example above A:B::C:D, if both A and B are EROFS we are
> limited in the files/directories that can be in B.
>
> e.g. we have A/foo with the following xattrs:
>
> trusted.overlay.metacopy=""
> trusted.overlay.redirect="/1e/de1743e73b904f16924c04fbd0b7fbfb7e45b8640241e7a08779e8f38fc20d"
>
> Now what would happen if /1e is present as a file in layer B?  It will
> just cause the lookup for `foo` to fail with EIO since the redirect
> didn't find any file in the layers below.
>
>

I understand the problem and I understand why a // redirect to data-only layers
would be a simple and workable solution for composefs.

Unlike the rest of the changes to overlayfs that we worked on to support
composefs, this would really be a composefs only on-disk format because it
could not be generated by overlayfs itself, so we need Miklos to chime in to
say if this is acceptable.

I have one question though:

If you place all the blobs under /.cfs/1e/... is that really going to
be an issue?
I mean the middle layers are not random stuff and I don't think we need to worry
about malicious images blocking the data layers, because malicious images have
much easier ways of making damage.

It doesn't seem so likely for images to overload /.cfs by mistake with
anything other
than a proper composefs blobs repository (which should have no hash conflicts)
and in the unlikely case that the image does happen to have a rogue /.cfs file
the container runtime can declare that image invalid for composefs mount.

Another observation: this problem reminds me of the "follow origin" [1] feature
I was working on a long time ago.

[1] https://github.com/amir73il/linux/commits/ovl-follow-origin

This work had a different use case and the patches (that follow directories)
are not relevant for this use case, but the same principle could be applied
to following metacopy to lowerdata by origin when the lowerdata cannot be
found by redirect (or redirected to "" to signify disconnected path).

Overlayfs copied up files and metacopy in particular have an "origin" xattr.
The "origin" xattr holds a file handle of the origin inode and uuid of
the origin layer.
This xattr has some uses, but I would like to point out one particular use -
in ovl_lookup() we use ovl_check_origin() to find a lower non-dir, so
that we can
use its i_ino for the overlay inode.

We do that also for non-metacopy and non-redirect. The special thing about
ovl_check_origin() is that it is blind to the problem that you described.
Unless the origin inode was deleted from the filesystem, overlayfs will find it.
The path of the found "origin" may not be known to overlayfs (i.e. disconnected
dentry), but it also does not matter to overlayfs.

The reason I am telling you about this is because in the case that composefs
is used to create composefs layers on a specific machine or in a specific
local network, where the blobs are going to be stored on a specific
shared backing
filesystem, using "origin" instead of "redirect" as the way to refer
to the lower-data
might not be such a bad idea.

For example, in LSFMM, Lennart presented the idea of a system service that would
be able to provide "signed composefs image creation" services for unprivileged
containers from standard OCI images. In that case, creating images
optimized for a
specific local blobs repository could be an option.

Apart from enabling composefs mount for unprivileged containers, the centric
system service scheme could also be used to "optimize" distributed composefs
images to the local storage (i.e. convert //data redirect to origin reference).

Combined with Alex's verity feature, I think we could allow following lowerdata
by "origin" without any further opt-in from user, meaning:
If lowerdata cannot be found by path (or has intentional "" redirect)
AND if metacopy has verity xattr, defer to lazy lowerdata lookup,
where lowerdata would be looked up by origin.

Does this sound like something that would be useful for composefs ecosystem?
If it is, I could send a patch for testing.

Thanks,
Amir.