Re: [LSF/MM/BFP TOPIC] Composefs vs erofs+overlay

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 3, 2023 at 2:57 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote:
>
> On Mon, Feb 27, 2023 at 10:22 AM Alexander Larsson <alexl@xxxxxxxxxx> wrote:
> >
> > Hello,
> >
> > Recently Giuseppe Scrivano and I have worked on[1] and proposed[2] the
> > Composefs filesystem. It is an opportunistically sharing, validating
> > image-based filesystem, targeting usecases like validated ostree
> > rootfs:es, validated container images that share common files, as well
> > as other image based usecases.
> >
> > During the discussions in the composefs proposal (as seen on LWN[3])
> > is has been proposed that (with some changes to overlayfs), similar
> > behaviour can be achieved by combining the overlayfs
> > "overlay.redirect" xattr with an read-only filesystem such as erofs.
> >
> > There are pros and cons to both these approaches, and the discussion
> > about their respective value has sometimes been heated. We would like
> > to have an in-person discussion at the summit, ideally also involving
> > more of the filesystem development community, so that we can reach
> > some consensus on what is the best apporach.
>
> In order to better understand the behaviour and requirements of the
> overlayfs+erofs approach I spent some time implementing direct support
> for erofs in libcomposefs. So, with current HEAD of
> github.com/containers/composefs you can now do:
>
> $ mkcompose --digest-store=objects --format=erofs source-dir image.erofs
>
> This will produce an object store with the backing files, and a erofs
> file with the required overlayfs xattrs, including a made up one
> called "overlay.fs-verity" containing the expected fs-verity digest
> for the lower dir. It also adds the required whiteouts to cover the
> 00-ff dirs from the lower dir.
>
> These erofs files are ordered similarly to the composefs files, and we
> give similar guarantees about their reproducibility, etc. So, they
> should be apples-to-apples comparable with the composefs images.
>
> Given this, I ran another set of performance tests on the original cs9
> rootfs dataset, again measuring the time of `ls -lR`. I also tried to
> measure the memory use like this:
>
> # echo 3 > /proc/sys/vm/drop_caches
> # systemd-run --scope sh -c 'ls -lR mountpoint' > /dev/null; cat $(cat
> /proc/self/cgroup | sed -e "s|0::|/sys/fs/cgroup|")/memory.peak'
>
> These are the alternatives I tried:
>
> xfs: the source of the image, regular dir on xfs
> erofs: the image.erofs above, on loopback
> erofs dio: the image.erofs above, on loopback with --direct-io=on
> ovl: erofs above combined with overlayfs
> ovl dio: erofs dio above combined with overlayfs
> cfs: composefs mount of image.cfs
>
> All tests use the same objects dir, stored on xfs. The erofs and
> overlay implementations are from a stock 6.1.13 kernel, and composefs
> module is from github HEAD.
>
> I tried loopback both with and without the direct-io option, because
> without direct-io enabled the kernel will double-cache the loopbacked
> data, as per[1].
>
> The produced images are:
>  8.9M image.cfs
> 11.3M image.erofs
>
> And gives these results:
>            | Cold cache | Warm cache | Mem use
>            |   (msec)   |   (msec)   |  (mb)
> -----------+------------+------------+---------
> xfs        |   1449     |    442     |    54
> erofs      |    700     |    391     |    45
> erofs dio  |    939     |    400     |    45
> ovl        |   1827     |    530     |   130
> ovl dio    |   2156     |    531     |   130
> cfs        |    689     |    389     |    51

It has been noted that the readahead done by kernel_read() may cause
read-ahead of unrelated data into memory which skews the results in
favour of workloads that consume all the filesystem metadata (such as
the ls -lR usecase of the above test). In the table above this favours
composefs (which uses kernel_read in some codepaths) as well as
non-dio erofs (non-dio loopback device uses readahead too).

I updated composefs to not use kernel_read here:
  https://github.com/containers/composefs/pull/105

And a new kernel patch-set based on this is available at:
  https://github.com/alexlarsson/linux/tree/composefs

The resulting table is now (dropping the non-dio erofs):

           | Cold cache | Warm cache | Mem use
           |   (msec)   |   (msec)   |  (mb)
-----------+------------+------------+---------
xfs        |   1449     |    442     |   54
erofs dio  |    939     |    400     |   45
ovl dio    |   2156     |    531     |  130
cfs        |    833     |    398     |   51

           | Cold cache | Warm cache | Mem use
           |   (msec)   |   (msec)   |  (mb)
-----------+------------+------------+---------
ext4       |   1135     |    394     |   54
erofs dio  |    922     |    401     |   45
ovl dio    |   1810     |    532     |  149
ovl lazy   |   1063     |    523     |  87
cfs        |    768     |    459     |  51

So, while cfs is somewhat worse now for this particular usecase, my
overall analysis still stands.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                Red Hat, Inc
       alexl@xxxxxxxxxx         alexander.larsson@xxxxxxxxx





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux