On Thu, May 25, 2023 at 7:59 PM Giuseppe Scrivano <gscrivan@xxxxxxxxxx> wrote: > > Hi Amir, > > Amir Goldstein <amir73il@xxxxxxxxx> writes: > > > On Thu, May 25, 2023 at 6:21 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: > >> > >> Something that came up about this in a discussion recently was > >> multi-layer composefs style images. For example, this may be a useful > >> approach for multi-layer container images. > >> > >> In such a setup you would have one lowerdata layer, but two real > >> lowerdirs, like lowerdir=A:B::C. In this situation a file in B may > >> accidentally have the same name as a file on C, causing a redirect > >> from A to end up in B instead of C. > >> > > > > I was under the impression that the names of the data blobs in C > > are supposed to be content derived names (hash). > > Is this not the case or is the concern about hash conflicts? > > > >> Would it be possible to have a syntax for redirects that mean "only > >> lookup in lowerdata layers. For example a double-slash path > >> //some/file. > >> > > > > Anything is possible if we can define the problem that needs to be solved. > > In this case, I did not understand why the problem is limited to finding a file > > by mistake in layer B. > > > > If there are several data layers A:B::C:D why wouldn't we have the same > > problem with a file name collision between C and D? > > the data layer is constructed in a way that files are stored by their > hash and there is control from the container runtime on how this is > built and maintained. So a file name collision would happen only when > on a hash collision. > > Differently for the other layers we've no control on what files are in > the image, unless we limit to mount only one EROFS as the first lower > layer and then all the other lower layers are data layers. > > Given your example above A:B::C:D, if both A and B are EROFS we are > limited in the files/directories that can be in B. > > e.g. we have A/foo with the following xattrs: > > trusted.overlay.metacopy="" > trusted.overlay.redirect="/1e/de1743e73b904f16924c04fbd0b7fbfb7e45b8640241e7a08779e8f38fc20d" > > Now what would happen if /1e is present as a file in layer B? It will > just cause the lookup for `foo` to fail with EIO since the redirect > didn't find any file in the layers below. > > I understand the problem and I understand why a // redirect to data-only layers would be a simple and workable solution for composefs. Unlike the rest of the changes to overlayfs that we worked on to support composefs, this would really be a composefs only on-disk format because it could not be generated by overlayfs itself, so we need Miklos to chime in to say if this is acceptable. I have one question though: If you place all the blobs under /.cfs/1e/... is that really going to be an issue? I mean the middle layers are not random stuff and I don't think we need to worry about malicious images blocking the data layers, because malicious images have much easier ways of making damage. It doesn't seem so likely for images to overload /.cfs by mistake with anything other than a proper composefs blobs repository (which should have no hash conflicts) and in the unlikely case that the image does happen to have a rogue /.cfs file the container runtime can declare that image invalid for composefs mount. Another observation: this problem reminds me of the "follow origin" [1] feature I was working on a long time ago. [1] https://github.com/amir73il/linux/commits/ovl-follow-origin This work had a different use case and the patches (that follow directories) are not relevant for this use case, but the same principle could be applied to following metacopy to lowerdata by origin when the lowerdata cannot be found by redirect (or redirected to "" to signify disconnected path). Overlayfs copied up files and metacopy in particular have an "origin" xattr. The "origin" xattr holds a file handle of the origin inode and uuid of the origin layer. This xattr has some uses, but I would like to point out one particular use - in ovl_lookup() we use ovl_check_origin() to find a lower non-dir, so that we can use its i_ino for the overlay inode. We do that also for non-metacopy and non-redirect. The special thing about ovl_check_origin() is that it is blind to the problem that you described. Unless the origin inode was deleted from the filesystem, overlayfs will find it. The path of the found "origin" may not be known to overlayfs (i.e. disconnected dentry), but it also does not matter to overlayfs. The reason I am telling you about this is because in the case that composefs is used to create composefs layers on a specific machine or in a specific local network, where the blobs are going to be stored on a specific shared backing filesystem, using "origin" instead of "redirect" as the way to refer to the lower-data might not be such a bad idea. For example, in LSFMM, Lennart presented the idea of a system service that would be able to provide "signed composefs image creation" services for unprivileged containers from standard OCI images. In that case, creating images optimized for a specific local blobs repository could be an option. Apart from enabling composefs mount for unprivileged containers, the centric system service scheme could also be used to "optimize" distributed composefs images to the local storage (i.e. convert //data redirect to origin reference). Combined with Alex's verity feature, I think we could allow following lowerdata by "origin" without any further opt-in from user, meaning: If lowerdata cannot be found by path (or has intentional "" redirect) AND if metacopy has verity xattr, defer to lazy lowerdata lookup, where lowerdata would be looked up by origin. Does this sound like something that would be useful for composefs ecosystem? If it is, I could send a patch for testing. Thanks, Amir.