Re: [LSF/MM/BFP TOPIC] Composefs vs erofs+overlay

Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> · Tue, 7 Mar 2023 17:26:39 +0800

On 2023/3/7 17:07, Alexander Larsson wrote:
On Tue, Mar 7, 2023 at 9:34 AM Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> wrote:

On 2023/3/7 16:21, Alexander Larsson wrote:
On Mon, Mar 6, 2023 at 5:17 PM Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> wrote:

I tested the performance of "ls -lR" on the whole tree of
cs9-developer-rootfs.  It seems that the performance of erofs (generated
from mkfs.erofs) is slightly better than that of composefs.  While the
performance of erofs generated from mkfs.composefs is slightly worse
that that of composefs.

I suspect that the reason for the lower performance of mkfs.composefs
is the added overlay.fs-verity xattr to all the files. It makes the
image larger, and that means more i/o.

Actually you could move overlay.fs-verity to EROFS shared xattr area (or
even overlay.redirect but it depends) if needed, which could save some
I/Os for your workloads.

shared xattrs can be used in this way as well if you care such minor
difference, actually I think inlined xattrs for your workload are just
meaningful for selinux labels and capabilities.

Really? Could you expand on this, because I would think it will be
sort of the opposite. In my usecase, the erofs fs will be read by
overlayfs, which will probably access overlay.* pretty often.  At the
very least it will load overlay.metacopy and overlay.redirect for
every lookup.

Really.  In that way, it will behave much similiar to composefs on-disk
arrangement now (in composefs vdata area).

Because in that way, although an extra I/O is needed for verification,
and it can only happen when actually opening the file (so "ls -lR" is
not impacted.) But on-disk inodes are more compact.

All EROFS xattrs will be cached in memory so that accessing
overlay.* pretty often is not greatly impacted due to no real I/Os
(IOWs, only some CPU time is consumed).

So, I tried moving the overlay.digest xattr to the shared area, but
actually this made the performance worse for the ls case. I have not

That is much strange.  We'd like to open it up if needed.  BTW, did you
test EROFS with acl enabled all the time?

looked into the cause in detail, but my guess is that ls looks for the
acl xattr, and such a negative lookup will cause erofs to look at all
the shared xattrs for the inode, which means they all end up being
loaded anyway. Of course, this will only affect ls (or other cases
that read the acl), so its perhaps a bit uncommon.

Yeah, in addition to that, I guess real acls could be landed in inlined
xattrs as well if exists...

Did you ever consider putting a bloom filter in the h_reserved area of
erofs_xattr_ibody_header? Then it could return early without i/o
operations for keys that are not set for the inode. Not sure what the
computational cost of that would be though.

Good idea!  Let me think about it, but enabling "noacl" mount
option isn't prefered if acl is no needed in your use cases.
Optimizing negative xattr lookups might need more on-disk
improvements which we didn't care about xattrs more. (although
"overlay.redirect" and "overlay.digest" seems fine for
composefs use cases.)

BTW, if you have more interest in this way, we could get in
touch in a more effective way to improve EROFS in addition to
community emails except for the userns stuff (I know it's useful
but I don't know the answers, maybe as Chistian said, we could
develop a new vfs feature to delegate a filesystem mount to an
unprivileged one [1].  I think it's much safer in that way for
kernel fses with on-disk format.)

[1] https://lore.kernel.org/r/20230126082228.rweg75ztaexykejv@wittgenstein

Thanks,
Gao Xiang