Re: [PATCH v2 0/6] Composefs: an opportunistically sharing verified image filesystem

Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> · Tue, 17 Jan 2023 08:12:13 +0800

On 2023/1/16 23:27, Alexander Larsson wrote:
On Mon, 2023-01-16 at 21:26 +0800, Gao Xiang wrote:

I will stop saying this overlay permission model anymore since
there are more experienced folks working on this, although SUID
stuff is still dangerous to me as an end-user:  IMHO, its hard
for me to identify proper sub-sub-subdir UID/GID in "objects"
at runtime, even they could happen much deep which is different
from localfs with loopback devices or overlayfs.  I don't know
what then inproper sub-sub-subdir UID/GID in "objects" could
cause.

It seems currently ostree uses "root" all the time for such
"objects" subdirs, I don't know.

Instead what we have done with composefs is to make filesystem
image
generation from the ostree repository 100% reproducible. Then
we
can

EROFS is all 100% reproduciable as well.

Really, so if I today, on fedora 36 run:
# tar xvf oci-image.tar
# mkfs.erofs oci-dir/ oci.erofs

And then in 5 years, if someone on debian 13 runs the same, with
the
same tar file, then both oci.erofs files will have the same sha256
checksum?

Why it doesn't?  Reproducable builds is a MUST for Android use cases
as well.

That is not quite the same requirements. A reproducible build in the
traditional sense is limited to a particular build configuration. You
define a set of tools for the build, and use the same ones for each
build, and get a fixed output. You don't expect to be able to change
e.g. the compiler and get the same result. Similarly, it is often the
case that different builds or versions of compression libraries gives
different results, so you can't expect to use e.g. a different libz and
get identical images.

Yes, it may break between versions by mistake, but I think
reproducable builds is a basic functionalaity for all image
use cases.

How do you handle things like different versions or builds of
compression libraries creating different results? Do you guarantee
to
not add any new backwards compat changes by default, or change any
default options? Do you guarantee that the files are read from
"oci-
dir" in the same order each time? It doesn't look like it.

If you'd like to say like that, why mkcomposefs doesn't have the
same issue that it may be broken by some bug.

libcomposefs defines a normalized form for everything like file order,
xattr orders, etc, and carefully normalizes everything such that we can
guarantee these properties. It is possible that some detail was missed,
because we're humans. But it was a very conscious and deliberate design
choice that is deeply encoded in the code and format. For example, this
is why we don't use compression but try to minimize size in other ways.

EROFS is reproducable since its dirents are all sorted because
of its on-disk definition.  And its xattrs are also sorted if
images needs to be reproducable.

I don't know what's the difference between these two
reproducable builds.  EROFS is designed for golden images, if
you pass in a set of configuration options for mkfs.erofs, it
should output the same output, otherwise they are really
buges and need to be fixed.

Compression algorithms could generate different outputs between
versions, and generally compressed data is stable for most
compression algorithms in a specific version but that is another
story.

EROFS can live without compression.

But really, personally I think the issue above is different from
loopback devices and may need to be resolved first. And if
possible,
I hope it could be an new overlayfs feature for everyone.

Yeah. Independent of composefs, I think EROFS would be better if
you
could just point it to a chunk directory at mount time rather than
having to route everything through a system-wide global cachefs
singleton. I understand that cachefs does help with the on-demand
download aspect, but when you don't need that it is just in the
way.

Just check your reply to Dave's review, it seems that how
composefs dir on-disk format works is also much similar to
EROFS as well, see:

https://docs.kernel.org/filesystems/erofs.html -- Directories

a block vs a chunk = dirent + names

cfs_dir_lookup -> erofs_namei + find_target_block_classic;
cfs_dir_lookup_in_chunk -> find_target_dirent.

Yeah, the dirent layout looks very similar. I guess great minds think
alike! My approach was simpler initially, but it kinda converged on
this when I started optimizing the kernel lookup code with binary
search.

Yes, great projects could be much similar to each other
occasionally, not to mention opensource projects ;)

Anyway, I'm not opposed to Composefs if folks really like a
new read-only filesystem for this. That is almost all I'd like
to say about Composefs formally, have fun!
Because, anyway, I have no idea considering opensource projects
could also do folk, so (maybe) such is life.

It seems rather another an incomplete EROFS from several points
of view.  Also see:
https://lore.kernel.org/all/1b192a85-e1da-0925-ef26-178b93d0aa45@xxxxxxxxxxxxx/T/#u

I will go on making a better EROFS as a promise to the
community initially.

Thanks,
Gao Xiang

Thanks,
Gao Xiang

Cool, thanks for the feedback.