Re: [LSF/MM/BFP TOPIC] Composefs vs erofs+overlay

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2023/3/4 23:29, Gao Xiang wrote:
Hi Colin,

On 2023/3/4 22:59, Colin Walters wrote:


On Fri, Mar 3, 2023, at 12:37 PM, Gao Xiang wrote:

Actually since you're container guys, I would like to mention
a way to directly reuse OCI tar data and not sure if you
have some interest as well, that is just to generate EROFS
metadata which could point to the tar blobs so that data itself
is still the original tar, but we could add fsverity + IMMUTABLE
to these blobs rather than the individual untared files.

   - OCI layer diff IDs in the OCI spec [1] are guaranteed;

The https://github.com/vbatts/tar-split approach addresses this problem domain adequately I think.

Thanks for the interest and comment.

I'm not aware of this project, and I'm not sure if tar-split
helps mount tar stuffs, maybe I'm missing something?

As for EROFS, as long as we support subpage block size, it's
entirely possible to refer the original tar data without tar
stream modification.


Correct me if I'm wrong, but having erofs point to underlying tar wouldn't by default get us page cache sharing or even the "opportunistic" disk sharing that composefs brings, unless userspace did something like attempting to dedup files in the tar stream via hashing and using reflinks on the underlying fs.  And then doing reflinks would require alignment inside the stream, right?  The https://fedoraproject.org/wiki/Changes/RPMCoW change is very similar in that it's proposing a modification of the RPM format to 4k align files in the

hmmm.. I think userspace don't need to dedupe files in the
tar stream.

stream for this reason.  But that's exactly it, then it's a new tweaked format and not identical to what came before, so the "compatibility" rationale is actually weakened a lot.



As you said, "opportunistic" finer disk sharing inside all tar
streams can be resolved by reflink or other stuffs by the underlay
filesystems (like XFS, or virtual devices like device mapper).

Not bacause EROFS cannot do on-disk dedupe, just because in this
way EROFS can only use the original tar blobs, and EROFS is not
the guy to resolve the on-disk sharing stuff.  However, here since
the original tar blob is used, so that the tar stream data is
unchanged (with the same diffID) when the container is running.

As a kernel filesystem, if two files are equal, we could treat them
in the same inode address space, even they are actually with slightly
different inode metadata (uid, gid, mode, nlink, etc).  That is
entirely possible as an in-kernel filesystem even currently linux
kernel doesn't implement finer page cache sharing, so EROFS can
support page-cache sharing of files in all tar streams if needed.

By the way, in case of misunderstanding, the current workable ways
of Linux page cache sharing don't _strictly_ need the real inode is
the same inode (like what stackable fs like overlayfs does), just
need sharing data among different inodes consecutive in one address
space, which means:

  1) we could reuse blob (the tar stream) address space to share
     page cache, actually that is what Jingbo's did for fscache
     page cache sharing:
     https://lore.kernel.org/r/20230203030143.73105-1-jefflexu@xxxxxxxxxxxxxxxxx

  2) create a virtual inode (or reuse one address space of real
     inodes) to share data between real inodes.

Either way can do page cache sharing of inodes with same data
across different filesystems and are practial without extra
linux-mm improvement.

thanks,
Gao Xiang


Thanks,
Gao Xiang



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux