Re: [PATCH v2 00/13] Overlayfs lazy lookup of lowerdata

Alexander Larsson <alexl@xxxxxxxxxx> · Fri, 26 May 2023 13:36:05 +0200

On Fri, May 26, 2023 at 7:12 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>
> On Thu, May 25, 2023 at 7:59 PM Giuseppe Scrivano <gscrivan@xxxxxxxxxx> wrote:
> >
> > Hi Amir,
> >
> > Amir Goldstein <amir73il@xxxxxxxxx> writes:
> >
> > > On Thu, May 25, 2023 at 6:21 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote:
> > >>
> > >> Something that came up about this in a discussion recently was
> > >> multi-layer composefs style images. For example, this may be a useful
> > >> approach for multi-layer container images.
> > >>
> > >> In such a setup you would have one lowerdata layer, but two real
> > >> lowerdirs, like lowerdir=A:B::C. In this situation a file in B may
> > >> accidentally have the same name as a file on C, causing a redirect
> > >> from A to end up in B instead of C.
> > >>
> > >
> > > I was under the impression that the names of the data blobs in C
> > > are supposed to be content derived names (hash).
> > > Is this not the case or is the concern about hash conflicts?
> > >
> > >> Would it be possible to have a syntax for redirects that mean "only
> > >> lookup in lowerdata layers. For example a double-slash path
> > >> //some/file.
> > >>
> > >
> > > Anything is possible if we can define the problem that needs to be solved.
> > > In this case, I did not understand why the problem is limited to finding a file
> > > by mistake in layer B.
> > >
> > > If there are several data layers A:B::C:D why wouldn't we have the same
> > > problem with a file name collision between C and D?
> >
> > the data layer is constructed in a way that files are stored by their
> > hash and there is control from the container runtime on how this is
> > built and maintained.  So a file name collision would happen only when
> > on a hash collision.
> >
> > Differently for the other layers we've no control on what files are in
> > the image, unless we limit to mount only one EROFS as the first lower
> > layer and then all the other lower layers are data layers.
> >
> > Given your example above A:B::C:D, if both A and B are EROFS we are
> > limited in the files/directories that can be in B.
> >
> > e.g. we have A/foo with the following xattrs:
> >
> > trusted.overlay.metacopy=""
> > trusted.overlay.redirect="/1e/de1743e73b904f16924c04fbd0b7fbfb7e45b8640241e7a08779e8f38fc20d"
> >
> > Now what would happen if /1e is present as a file in layer B?  It will
> > just cause the lookup for `foo` to fail with EIO since the redirect
> > didn't find any file in the layers below.
> >
> >
>
> I understand the problem and I understand why a // redirect to data-only layers
> would be a simple and workable solution for composefs.
>
> Unlike the rest of the changes to overlayfs that we worked on to support
> composefs, this would really be a composefs only on-disk format because it
> could not be generated by overlayfs itself, so we need Miklos to chime in to
> say if this is acceptable.
>
> I have one question though:
>
> If you place all the blobs under /.cfs/1e/... is that really going to
> be an issue?
> I mean the middle layers are not random stuff and I don't think we need to worry
> about malicious images blocking the data layers, because malicious images have
> much easier ways of making damage.

I think if you make the prefix (the /.cfs part) "weird" enough it will
only become an issue for the malicious layer case. For example, a
malicious layer could inject a file that was typically unused, but if
an upper layer happened to use a particular redirect path it would get
unexpected content for its redirect.

However, you wouldn't be able to inject such a base layer after the
fact, it would have to have been there already in the base layer used
when building the upper layer. So in this case I agree, such a
malicious layer used already at image build time can do malicious
things in much easier ways anyway.

> It doesn't seem so likely for images to overload /.cfs by mistake with
> anything other
> than a proper composefs blobs repository (which should have no hash conflicts)
> and in the unlikely case that the image does happen to have a rogue /.cfs file
> the container runtime can declare that image invalid for composefs mount.
>
> Another observation: this problem reminds me of the "follow origin" [1] feature
> I was working on a long time ago.
>
> [1] https://github.com/amir73il/linux/commits/ovl-follow-origin
>
> This work had a different use case and the patches (that follow directories)
> are not relevant for this use case, but the same principle could be applied
> to following metacopy to lowerdata by origin when the lowerdata cannot be
> found by redirect (or redirected to "" to signify disconnected path).
>
> Overlayfs copied up files and metacopy in particular have an "origin" xattr.
> The "origin" xattr holds a file handle of the origin inode and uuid of
> the origin layer.
> This xattr has some uses, but I would like to point out one particular use -
> in ovl_lookup() we use ovl_check_origin() to find a lower non-dir, so
> that we can
> use its i_ino for the overlay inode.
>
> We do that also for non-metacopy and non-redirect. The special thing about
> ovl_check_origin() is that it is blind to the problem that you described.
> Unless the origin inode was deleted from the filesystem, overlayfs will find it.
> The path of the found "origin" may not be known to overlayfs (i.e. disconnected
> dentry), but it also does not matter to overlayfs.
>
> The reason I am telling you about this is because in the case that composefs
> is used to create composefs layers on a specific machine or in a specific
> local network, where the blobs are going to be stored on a specific
> shared backing
> filesystem, using "origin" instead of "redirect" as the way to refer
> to the lower-data
> might not be such a bad idea.
>
> For example, in LSFMM, Lennart presented the idea of a system service that would
> be able to provide "signed composefs image creation" services for unprivileged
> containers from standard OCI images. In that case, creating images
> optimized for a
> specific local blobs repository could be an option.
>
> Apart from enabling composefs mount for unprivileged containers, the centric
> system service scheme could also be used to "optimize" distributed composefs
> images to the local storage (i.e. convert //data redirect to origin reference).
>
> Combined with Alex's verity feature, I think we could allow following lowerdata
> by "origin" without any further opt-in from user, meaning:
> If lowerdata cannot be found by path (or has intentional "" redirect)
> AND if metacopy has verity xattr, defer to lazy lowerdata lookup,
> where lowerdata would be looked up by origin.
>
> Does this sound like something that would be useful for composefs ecosystem?
> If it is, I could send a patch for testing.

I can't think of any use of this currently. Generally we very rarely
work with images using a backing store tied to a particular machine
like this. Generally the goal is to have something you can distribute.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                Red Hat, Inc
       alexl@xxxxxxxxxx         alexander.larsson@xxxxxxxxx