Hi Colin,
On 2023/3/4 22:59, Colin Walters wrote:
On Fri, Mar 3, 2023, at 12:37 PM, Gao Xiang wrote:
Actually since you're container guys, I would like to mention
a way to directly reuse OCI tar data and not sure if you
have some interest as well, that is just to generate EROFS
metadata which could point to the tar blobs so that data itself
is still the original tar, but we could add fsverity + IMMUTABLE
to these blobs rather than the individual untared files.
- OCI layer diff IDs in the OCI spec [1] are guaranteed;
The https://github.com/vbatts/tar-split approach addresses this problem domain adequately I think.
Thanks for the interest and comment.
I'm not aware of this project, and I'm not sure if tar-split
helps mount tar stuffs, maybe I'm missing something?
As for EROFS, as long as we support subpage block size, it's
entirely possible to refer the original tar data without tar
stream modification.
Correct me if I'm wrong, but having erofs point to underlying tar wouldn't by default get us page cache sharing or even the "opportunistic" disk sharing that composefs brings, unless userspace did something like attempting to dedup files in the tar stream via hashing and using reflinks on the underlying fs. And then doing reflinks would require alignment inside the stream, right? The https://fedoraproject.org/wiki/Changes/RPMCoW change is very similar in that it's proposing a modification of the RPM format to 4k align files in the
hmmm.. I think userspace don't need to dedupe files in the
tar stream.
stream for this reason. But that's exactly it, then it's a new tweaked format and not identical to what came before, so the "compatibility" rationale is actually weakened a lot.
As you said, "opportunistic" finer disk sharing inside all tar
streams can be resolved by reflink or other stuffs by the underlay
filesystems (like XFS, or virtual devices like device mapper).
Not bacause EROFS cannot do on-disk dedupe, just because in this
way EROFS can only use the original tar blobs, and EROFS is not
the guy to resolve the on-disk sharing stuff. However, here since
the original tar blob is used, so that the tar stream data is
unchanged (with the same diffID) when the container is running.
As a kernel filesystem, if two files are equal, we could treat them
in the same inode address space, even they are actually with slightly
different inode metadata (uid, gid, mode, nlink, etc). That is
entirely possible as an in-kernel filesystem even currently linux
kernel doesn't implement finer page cache sharing, so EROFS can
support page-cache sharing of files in all tar streams if needed.
Thanks,
Gao Xiang