On Fri, May 26, 2023 at 7:12 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > On Thu, May 25, 2023 at 7:59 PM Giuseppe Scrivano <gscrivan@xxxxxxxxxx> wrote: > > > > Hi Amir, > > > > Amir Goldstein <amir73il@xxxxxxxxx> writes: > > > > > On Thu, May 25, 2023 at 6:21 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > >> > > >> Something that came up about this in a discussion recently was > > >> multi-layer composefs style images. For example, this may be a useful > > >> approach for multi-layer container images. > > >> > > >> In such a setup you would have one lowerdata layer, but two real > > >> lowerdirs, like lowerdir=A:B::C. In this situation a file in B may > > >> accidentally have the same name as a file on C, causing a redirect > > >> from A to end up in B instead of C. > > >> > > > > > > I was under the impression that the names of the data blobs in C > > > are supposed to be content derived names (hash). > > > Is this not the case or is the concern about hash conflicts? > > > > > >> Would it be possible to have a syntax for redirects that mean "only > > >> lookup in lowerdata layers. For example a double-slash path > > >> //some/file. > > >> > > > > > > Anything is possible if we can define the problem that needs to be solved. > > > In this case, I did not understand why the problem is limited to finding a file > > > by mistake in layer B. > > > > > > If there are several data layers A:B::C:D why wouldn't we have the same > > > problem with a file name collision between C and D? > > > > the data layer is constructed in a way that files are stored by their > > hash and there is control from the container runtime on how this is > > built and maintained. So a file name collision would happen only when > > on a hash collision. > > > > Differently for the other layers we've no control on what files are in > > the image, unless we limit to mount only one EROFS as the first lower > > layer and then all the other lower layers are data layers. > > > > Given your example above A:B::C:D, if both A and B are EROFS we are > > limited in the files/directories that can be in B. > > > > e.g. we have A/foo with the following xattrs: > > > > trusted.overlay.metacopy="" > > trusted.overlay.redirect="/1e/de1743e73b904f16924c04fbd0b7fbfb7e45b8640241e7a08779e8f38fc20d" > > > > Now what would happen if /1e is present as a file in layer B? It will > > just cause the lookup for `foo` to fail with EIO since the redirect > > didn't find any file in the layers below. > > > > > > I understand the problem and I understand why a // redirect to data-only layers > would be a simple and workable solution for composefs. > > Unlike the rest of the changes to overlayfs that we worked on to support > composefs, this would really be a composefs only on-disk format because it > could not be generated by overlayfs itself, so we need Miklos to chime in to > say if this is acceptable. > > I have one question though: > > If you place all the blobs under /.cfs/1e/... is that really going to > be an issue? > I mean the middle layers are not random stuff and I don't think we need to worry > about malicious images blocking the data layers, because malicious images have > much easier ways of making damage. I think if you make the prefix (the /.cfs part) "weird" enough it will only become an issue for the malicious layer case. For example, a malicious layer could inject a file that was typically unused, but if an upper layer happened to use a particular redirect path it would get unexpected content for its redirect. However, you wouldn't be able to inject such a base layer after the fact, it would have to have been there already in the base layer used when building the upper layer. So in this case I agree, such a malicious layer used already at image build time can do malicious things in much easier ways anyway. > It doesn't seem so likely for images to overload /.cfs by mistake with > anything other > than a proper composefs blobs repository (which should have no hash conflicts) > and in the unlikely case that the image does happen to have a rogue /.cfs file > the container runtime can declare that image invalid for composefs mount. > > Another observation: this problem reminds me of the "follow origin" [1] feature > I was working on a long time ago. > > [1] https://github.com/amir73il/linux/commits/ovl-follow-origin > > This work had a different use case and the patches (that follow directories) > are not relevant for this use case, but the same principle could be applied > to following metacopy to lowerdata by origin when the lowerdata cannot be > found by redirect (or redirected to "" to signify disconnected path). > > Overlayfs copied up files and metacopy in particular have an "origin" xattr. > The "origin" xattr holds a file handle of the origin inode and uuid of > the origin layer. > This xattr has some uses, but I would like to point out one particular use - > in ovl_lookup() we use ovl_check_origin() to find a lower non-dir, so > that we can > use its i_ino for the overlay inode. > > We do that also for non-metacopy and non-redirect. The special thing about > ovl_check_origin() is that it is blind to the problem that you described. > Unless the origin inode was deleted from the filesystem, overlayfs will find it. > The path of the found "origin" may not be known to overlayfs (i.e. disconnected > dentry), but it also does not matter to overlayfs. > > The reason I am telling you about this is because in the case that composefs > is used to create composefs layers on a specific machine or in a specific > local network, where the blobs are going to be stored on a specific > shared backing > filesystem, using "origin" instead of "redirect" as the way to refer > to the lower-data > might not be such a bad idea. > > For example, in LSFMM, Lennart presented the idea of a system service that would > be able to provide "signed composefs image creation" services for unprivileged > containers from standard OCI images. In that case, creating images > optimized for a > specific local blobs repository could be an option. > > Apart from enabling composefs mount for unprivileged containers, the centric > system service scheme could also be used to "optimize" distributed composefs > images to the local storage (i.e. convert //data redirect to origin reference). > > Combined with Alex's verity feature, I think we could allow following lowerdata > by "origin" without any further opt-in from user, meaning: > If lowerdata cannot be found by path (or has intentional "" redirect) > AND if metacopy has verity xattr, defer to lazy lowerdata lookup, > where lowerdata would be looked up by origin. > > Does this sound like something that would be useful for composefs ecosystem? > If it is, I could send a patch for testing. I can't think of any use of this currently. Generally we very rarely work with images using a backing store tied to a particular machine like this. Generally the goal is to have something you can distribute. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Alexander Larsson Red Hat, Inc alexl@xxxxxxxxxx alexander.larsson@xxxxxxxxx