Re: [LSF/MM/BFP TOPIC] Composefs vs erofs+overlay

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 3/6/23 11:49 PM, Jingbo Xu wrote:
> 
> 
> On 3/6/23 7:33 PM, Alexander Larsson wrote:
>> On Fri, Mar 3, 2023 at 2:57 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote:
>>>
>>> On Mon, Feb 27, 2023 at 10:22 AM Alexander Larsson <alexl@xxxxxxxxxx> wrote:
>>>>
>>>> Hello,
>>>>
>>>> Recently Giuseppe Scrivano and I have worked on[1] and proposed[2] the
>>>> Composefs filesystem. It is an opportunistically sharing, validating
>>>> image-based filesystem, targeting usecases like validated ostree
>>>> rootfs:es, validated container images that share common files, as well
>>>> as other image based usecases.
>>>>
>>>> During the discussions in the composefs proposal (as seen on LWN[3])
>>>> is has been proposed that (with some changes to overlayfs), similar
>>>> behaviour can be achieved by combining the overlayfs
>>>> "overlay.redirect" xattr with an read-only filesystem such as erofs.
>>>>
>>>> There are pros and cons to both these approaches, and the discussion
>>>> about their respective value has sometimes been heated. We would like
>>>> to have an in-person discussion at the summit, ideally also involving
>>>> more of the filesystem development community, so that we can reach
>>>> some consensus on what is the best apporach.
>>>
>>> In order to better understand the behaviour and requirements of the
>>> overlayfs+erofs approach I spent some time implementing direct support
>>> for erofs in libcomposefs. So, with current HEAD of
>>> github.com/containers/composefs you can now do:
>>>
>>> $ mkcompose --digest-store=objects --format=erofs source-dir image.erofs
>>>
>>> This will produce an object store with the backing files, and a erofs
>>> file with the required overlayfs xattrs, including a made up one
>>> called "overlay.fs-verity" containing the expected fs-verity digest
>>> for the lower dir. It also adds the required whiteouts to cover the
>>> 00-ff dirs from the lower dir.
>>>
>>> These erofs files are ordered similarly to the composefs files, and we
>>> give similar guarantees about their reproducibility, etc. So, they
>>> should be apples-to-apples comparable with the composefs images.
>>>
>>> Given this, I ran another set of performance tests on the original cs9
>>> rootfs dataset, again measuring the time of `ls -lR`. I also tried to
>>> measure the memory use like this:
>>>
>>> # echo 3 > /proc/sys/vm/drop_caches
>>> # systemd-run --scope sh -c 'ls -lR mountpoint' > /dev/null; cat $(cat
>>> /proc/self/cgroup | sed -e "s|0::|/sys/fs/cgroup|")/memory.peak'
>>>
>>> These are the alternatives I tried:
>>>
>>> xfs: the source of the image, regular dir on xfs
>>> erofs: the image.erofs above, on loopback
>>> erofs dio: the image.erofs above, on loopback with --direct-io=on
>>> ovl: erofs above combined with overlayfs
>>> ovl dio: erofs dio above combined with overlayfs
>>> cfs: composefs mount of image.cfs
>>>
>>> All tests use the same objects dir, stored on xfs. The erofs and
>>> overlay implementations are from a stock 6.1.13 kernel, and composefs
>>> module is from github HEAD.
>>>
>>> I tried loopback both with and without the direct-io option, because
>>> without direct-io enabled the kernel will double-cache the loopbacked
>>> data, as per[1].
>>>
>>> The produced images are:
>>>  8.9M image.cfs
>>> 11.3M image.erofs
>>>
>>> And gives these results:
>>>            | Cold cache | Warm cache | Mem use
>>>            |   (msec)   |   (msec)   |  (mb)
>>> -----------+------------+------------+---------
>>> xfs        |   1449     |    442     |    54
>>> erofs      |    700     |    391     |    45
>>> erofs dio  |    939     |    400     |    45
>>> ovl        |   1827     |    530     |   130
>>> ovl dio    |   2156     |    531     |   130
>>> cfs        |    689     |    389     |    51
>>
>> It has been noted that the readahead done by kernel_read() may cause
>> read-ahead of unrelated data into memory which skews the results in
>> favour of workloads that consume all the filesystem metadata (such as
>> the ls -lR usecase of the above test). In the table above this favours
>> composefs (which uses kernel_read in some codepaths) as well as
>> non-dio erofs (non-dio loopback device uses readahead too).
>>
>> I updated composefs to not use kernel_read here:
>>   https://github.com/containers/composefs/pull/105
>>
>> And a new kernel patch-set based on this is available at:
>>   https://github.com/alexlarsson/linux/tree/composefs
>>
>> The resulting table is now (dropping the non-dio erofs):
>>
>>            | Cold cache | Warm cache | Mem use
>>            |   (msec)   |   (msec)   |  (mb)
>> -----------+------------+------------+---------
>> xfs        |   1449     |    442     |   54
>> erofs dio  |    939     |    400     |   45
>> ovl dio    |   2156     |    531     |  130
>> cfs        |    833     |    398     |   51
>>
>>            | Cold cache | Warm cache | Mem use
>>            |   (msec)   |   (msec)   |  (mb)
>> -----------+------------+------------+---------
>> ext4       |   1135     |    394     |   54
>> erofs dio  |    922     |    401     |   45
>> ovl dio    |   1810     |    532     |  149
>> ovl lazy   |   1063     |    523     |  87
>> cfs        |    768     |    459     |  51
>>
>> So, while cfs is somewhat worse now for this particular usecase, my
>> overall analysis still stands.
>>
> 
> Hi,
> 
> I tested your patch removing kernel_read(), and here is the statistics
> tested in my environment.
> 
> 
> Setup
> ======
> CPU: x86_64 Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
> Disk: cloud disk, 11800 IOPS upper limit
> OS: Linux v6.2
> FS of backing objects: xfs
> 
> 
> Image size
> ===========
> 8.6M large.composefs (with --compute-digest)
> 8.9M large.erofs (mkfs.erofs)
> 11M  large.cps.in.erofs (mkfs.composefs --compute-digest --format=erofs)
> 
> 
> Perf of "ls -lR"
> ================
> 					      | uncached| cached
> 					      |  (ms)	|  (ms)
> ----------------------------------------------|---------|--------
> composefs				      	   | 519	| 178
> erofs (mkfs.erofs, DIRECT loop) 	     	   | 497 	| 192
> erofs (mkfs.composefs --format=erofs, DIRECT loop) | 536 	| 199
> 
> I tested the performance of "ls -lR" on the whole tree of
> cs9-developer-rootfs.  It seems that the performance of erofs (generated
> from mkfs.erofs) is slightly better than that of composefs.  While the
> performance of erofs generated from mkfs.composefs is slightly worse
> that that of composefs.
> 
> The uncached performance is somewhat slightly different with that given
> by Alexander Larsson.  I think it may be due to different test
> environment, as my test machine is a server with robust performance,
> with cloud disk as storage.
> 
> It's just a simple test without further analysis, as it's a bit late for
> me :)
> 

Forgot to mention that all erofs (no matter generated from mkfs.erofs or
mkfs.composefs) are mounted with "-o noacl", as composefs has not
implemented its acl yet.


-- 
Thanks,
Jingbo



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux