Re: [LSF/MM/BFP TOPIC] Composefs vs erofs+overlay

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 6, 2023 at 4:49 PM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote:
> On 3/6/23 7:33 PM, Alexander Larsson wrote:
> > On Fri, Mar 3, 2023 at 2:57 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote:
> >>
> >> On Mon, Feb 27, 2023 at 10:22 AM Alexander Larsson <alexl@xxxxxxxxxx> wrote:
> >>>
> >>> Hello,
> >>>
> >>> Recently Giuseppe Scrivano and I have worked on[1] and proposed[2] the
> >>> Composefs filesystem. It is an opportunistically sharing, validating
> >>> image-based filesystem, targeting usecases like validated ostree
> >>> rootfs:es, validated container images that share common files, as well
> >>> as other image based usecases.
> >>>
> >>> During the discussions in the composefs proposal (as seen on LWN[3])
> >>> is has been proposed that (with some changes to overlayfs), similar
> >>> behaviour can be achieved by combining the overlayfs
> >>> "overlay.redirect" xattr with an read-only filesystem such as erofs.
> >>>
> >>> There are pros and cons to both these approaches, and the discussion
> >>> about their respective value has sometimes been heated. We would like
> >>> to have an in-person discussion at the summit, ideally also involving
> >>> more of the filesystem development community, so that we can reach
> >>> some consensus on what is the best apporach.
> >>
> >> In order to better understand the behaviour and requirements of the
> >> overlayfs+erofs approach I spent some time implementing direct support
> >> for erofs in libcomposefs. So, with current HEAD of
> >> github.com/containers/composefs you can now do:
> >>
> >> $ mkcompose --digest-store=objects --format=erofs source-dir image.erofs
> >>
> >> This will produce an object store with the backing files, and a erofs
> >> file with the required overlayfs xattrs, including a made up one
> >> called "overlay.fs-verity" containing the expected fs-verity digest
> >> for the lower dir. It also adds the required whiteouts to cover the
> >> 00-ff dirs from the lower dir.
> >>
> >> These erofs files are ordered similarly to the composefs files, and we
> >> give similar guarantees about their reproducibility, etc. So, they
> >> should be apples-to-apples comparable with the composefs images.
> >>
> >> Given this, I ran another set of performance tests on the original cs9
> >> rootfs dataset, again measuring the time of `ls -lR`. I also tried to
> >> measure the memory use like this:
> >>
> >> # echo 3 > /proc/sys/vm/drop_caches
> >> # systemd-run --scope sh -c 'ls -lR mountpoint' > /dev/null; cat $(cat
> >> /proc/self/cgroup | sed -e "s|0::|/sys/fs/cgroup|")/memory.peak'
> >>
> >> These are the alternatives I tried:
> >>
> >> xfs: the source of the image, regular dir on xfs
> >> erofs: the image.erofs above, on loopback
> >> erofs dio: the image.erofs above, on loopback with --direct-io=on
> >> ovl: erofs above combined with overlayfs
> >> ovl dio: erofs dio above combined with overlayfs
> >> cfs: composefs mount of image.cfs
> >>
> >> All tests use the same objects dir, stored on xfs. The erofs and
> >> overlay implementations are from a stock 6.1.13 kernel, and composefs
> >> module is from github HEAD.
> >>
> >> I tried loopback both with and without the direct-io option, because
> >> without direct-io enabled the kernel will double-cache the loopbacked
> >> data, as per[1].
> >>
> >> The produced images are:
> >>  8.9M image.cfs
> >> 11.3M image.erofs
> >>
> >> And gives these results:
> >>            | Cold cache | Warm cache | Mem use
> >>            |   (msec)   |   (msec)   |  (mb)
> >> -----------+------------+------------+---------
> >> xfs        |   1449     |    442     |    54
> >> erofs      |    700     |    391     |    45
> >> erofs dio  |    939     |    400     |    45
> >> ovl        |   1827     |    530     |   130
> >> ovl dio    |   2156     |    531     |   130
> >> cfs        |    689     |    389     |    51
> >
> > It has been noted that the readahead done by kernel_read() may cause
> > read-ahead of unrelated data into memory which skews the results in
> > favour of workloads that consume all the filesystem metadata (such as
> > the ls -lR usecase of the above test). In the table above this favours
> > composefs (which uses kernel_read in some codepaths) as well as
> > non-dio erofs (non-dio loopback device uses readahead too).
> >
> > I updated composefs to not use kernel_read here:
> >   https://github.com/containers/composefs/pull/105
> >
> > And a new kernel patch-set based on this is available at:
> >   https://github.com/alexlarsson/linux/tree/composefs
> >
> > The resulting table is now (dropping the non-dio erofs):
> >
> >            | Cold cache | Warm cache | Mem use
> >            |   (msec)   |   (msec)   |  (mb)
> > -----------+------------+------------+---------
> > xfs        |   1449     |    442     |   54
> > erofs dio  |    939     |    400     |   45
> > ovl dio    |   2156     |    531     |  130
> > cfs        |    833     |    398     |   51
> >
> >            | Cold cache | Warm cache | Mem use
> >            |   (msec)   |   (msec)   |  (mb)
> > -----------+------------+------------+---------
> > ext4       |   1135     |    394     |   54
> > erofs dio  |    922     |    401     |   45
> > ovl dio    |   1810     |    532     |  149
> > ovl lazy   |   1063     |    523     |  87
> > cfs        |    768     |    459     |  51
> >
> > So, while cfs is somewhat worse now for this particular usecase, my
> > overall analysis still stands.
> >
>
> Hi,
>
> I tested your patch removing kernel_read(), and here is the statistics
> tested in my environment.
>
>
> Setup
> ======
> CPU: x86_64 Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
> Disk: cloud disk, 11800 IOPS upper limit
> OS: Linux v6.2
> FS of backing objects: xfs
>
>
> Image size
> ===========
> 8.6M large.composefs (with --compute-digest)
> 8.9M large.erofs (mkfs.erofs)
> 11M  large.cps.in.erofs (mkfs.composefs --compute-digest --format=erofs)
>
>
> Perf of "ls -lR"
> ================
>                                               | uncached| cached
>                                               |  (ms)   |  (ms)
> ----------------------------------------------|---------|--------
> composefs                                          | 519        | 178
> erofs (mkfs.erofs, DIRECT loop)                    | 497        | 192
> erofs (mkfs.composefs --format=erofs, DIRECT loop) | 536        | 199
>
> I tested the performance of "ls -lR" on the whole tree of
> cs9-developer-rootfs.  It seems that the performance of erofs (generated
> from mkfs.erofs) is slightly better than that of composefs.  While the
> performance of erofs generated from mkfs.composefs is slightly worse
> that that of composefs.

I suspect that the reason for the lower performance of mkfs.composefs
is the added overlay.fs-verity xattr to all the files. It makes the
image larger, and that means more i/o.

> The uncached performance is somewhat slightly different with that given
> by Alexander Larsson.  I think it may be due to different test
> environment, as my test machine is a server with robust performance,
> with cloud disk as storage.
>
> It's just a simple test without further analysis, as it's a bit late for
> me :)

Yeah, and for the record, I'm not claiming that my tests contain any
high degree of analysis or rigour either. They are short simple test
runs that give a rough estimate of the overall performance of metadata
operations. What is interesting here is if there are large or
unexpected differences, and from that point of view our results are
basically the same.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                Red Hat, Inc
       alexl@xxxxxxxxxx         alexander.larsson@xxxxxxxxx





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux