Re: [PATCH v2 00/13] Overlayfs lazy lookup of lowerdata

Amir Goldstein <amir73il@xxxxxxxxx> · Sat, 27 May 2023 17:04:00 +0300

On Fri, May 26, 2023 at 9:27 PM Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> On 2023/5/26 04:36, Alexander Larsson wrote:
> > On Fri, May 26, 2023 at 7:12 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >>
> >> On Thu, May 25, 2023 at 7:59 PM Giuseppe Scrivano <gscrivan@xxxxxxxxxx> wrote:
> >>>
> >>> Hi Amir,
> >>>
> >>> Amir Goldstein <amir73il@xxxxxxxxx> writes:
> >>>
> >>>> On Thu, May 25, 2023 at 6:21 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote:
> >>>>>
> >>>>> Something that came up about this in a discussion recently was
> >>>>> multi-layer composefs style images. For example, this may be a useful
> >>>>> approach for multi-layer container images.
> >>>>>
> >>>>> In such a setup you would have one lowerdata layer, but two real
> >>>>> lowerdirs, like lowerdir=A:B::C. In this situation a file in B may
> >>>>> accidentally have the same name as a file on C, causing a redirect
> >>>>> from A to end up in B instead of C.
> >>>>>
> >>>>
> >>>> I was under the impression that the names of the data blobs in C
> >>>> are supposed to be content derived names (hash).
> >>>> Is this not the case or is the concern about hash conflicts?
> >>>>
> >>>>> Would it be possible to have a syntax for redirects that mean "only
> >>>>> lookup in lowerdata layers. For example a double-slash path
> >>>>> //some/file.
> >>>>>
> >>>>
> >>>> Anything is possible if we can define the problem that needs to be solved.
> >>>> In this case, I did not understand why the problem is limited to finding a file
> >>>> by mistake in layer B.
> >>>>
> >>>> If there are several data layers A:B::C:D why wouldn't we have the same
> >>>> problem with a file name collision between C and D?
> >>>
> >>> the data layer is constructed in a way that files are stored by their
> >>> hash and there is control from the container runtime on how this is
> >>> built and maintained.  So a file name collision would happen only when
> >>> on a hash collision.
> >>>
> >>> Differently for the other layers we've no control on what files are in
> >>> the image, unless we limit to mount only one EROFS as the first lower
> >>> layer and then all the other lower layers are data layers.
> >>>
> >>> Given your example above A:B::C:D, if both A and B are EROFS we are
> >>> limited in the files/directories that can be in B.
> >>>
> >>> e.g. we have A/foo with the following xattrs:
> >>>
> >>> trusted.overlay.metacopy=""
> >>> trusted.overlay.redirect="/1e/de1743e73b904f16924c04fbd0b7fbfb7e45b8640241e7a08779e8f38fc20d"
> >>>
> >>> Now what would happen if /1e is present as a file in layer B?  It will
> >>> just cause the lookup for `foo` to fail with EIO since the redirect
> >>> didn't find any file in the layers below.
> >>>
> >>>
> >>
> >> I understand the problem and I understand why a // redirect to data-only layers
> >> would be a simple and workable solution for composefs.
> >>
> >> Unlike the rest of the changes to overlayfs that we worked on to support
> >> composefs, this would really be a composefs only on-disk format because it
> >> could not be generated by overlayfs itself, so we need Miklos to chime in to
> >> say if this is acceptable.
>
> An alternative way might allow data-only layers (or invisible layers) in the
> middle rather than as the tail?
>

Anything is possible if you can justify its worth.

> I'm not sure in the long term if it's flexible to fix data-only layers as the
> bottom-most layers for future potential use cases.
>
> At a quick glance, I've seen the implementation of this patchset also
> strictly code that.   I wonder if using non-fixed invisible layers increases
> the complexity or am I still missing something?
>

The current implementation is quite simplified due to keeping data-only
layers in the tail, and even more simplified that lazy lowerdata lookup
is only in the data-only layers at the tail of the stack.
The documentation is also simpler as do the tests.

Making all the the above more complex needs justification and so far
I did not see any use case that would justify it, because the /.cfs
workaround is good enough IMO.

That leaves the question - is the design/API flexible enough to be
extended in the future if we needed to?

If we would want to support data-only layers in the middle on the
stack, which would this syntax make sense?
lowerdir=lower1::data1:lower2::data2

If this syntax makes sense to everyone, then we can change the syntax
of data-only in the tail from lower1::data1:data2 to lower1::data1::data2
and enforce that after the first ::, only :: are allowed.

Miklos, any thoughts?
I have a feeling that this was your natural interpretation when you first
saw the :: syntax.

Thanks,
Amir.