On 3/6/23 11:49 PM, Jingbo Xu wrote: > > > On 3/6/23 7:33 PM, Alexander Larsson wrote: >> On Fri, Mar 3, 2023 at 2:57 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: >>> >>> On Mon, Feb 27, 2023 at 10:22 AM Alexander Larsson <alexl@xxxxxxxxxx> wrote: >>>> >>>> Hello, >>>> >>>> Recently Giuseppe Scrivano and I have worked on[1] and proposed[2] the >>>> Composefs filesystem. It is an opportunistically sharing, validating >>>> image-based filesystem, targeting usecases like validated ostree >>>> rootfs:es, validated container images that share common files, as well >>>> as other image based usecases. >>>> >>>> During the discussions in the composefs proposal (as seen on LWN[3]) >>>> is has been proposed that (with some changes to overlayfs), similar >>>> behaviour can be achieved by combining the overlayfs >>>> "overlay.redirect" xattr with an read-only filesystem such as erofs. >>>> >>>> There are pros and cons to both these approaches, and the discussion >>>> about their respective value has sometimes been heated. We would like >>>> to have an in-person discussion at the summit, ideally also involving >>>> more of the filesystem development community, so that we can reach >>>> some consensus on what is the best apporach. >>> >>> In order to better understand the behaviour and requirements of the >>> overlayfs+erofs approach I spent some time implementing direct support >>> for erofs in libcomposefs. So, with current HEAD of >>> github.com/containers/composefs you can now do: >>> >>> $ mkcompose --digest-store=objects --format=erofs source-dir image.erofs >>> >>> This will produce an object store with the backing files, and a erofs >>> file with the required overlayfs xattrs, including a made up one >>> called "overlay.fs-verity" containing the expected fs-verity digest >>> for the lower dir. It also adds the required whiteouts to cover the >>> 00-ff dirs from the lower dir. >>> >>> These erofs files are ordered similarly to the composefs files, and we >>> give similar guarantees about their reproducibility, etc. So, they >>> should be apples-to-apples comparable with the composefs images. >>> >>> Given this, I ran another set of performance tests on the original cs9 >>> rootfs dataset, again measuring the time of `ls -lR`. I also tried to >>> measure the memory use like this: >>> >>> # echo 3 > /proc/sys/vm/drop_caches >>> # systemd-run --scope sh -c 'ls -lR mountpoint' > /dev/null; cat $(cat >>> /proc/self/cgroup | sed -e "s|0::|/sys/fs/cgroup|")/memory.peak' >>> >>> These are the alternatives I tried: >>> >>> xfs: the source of the image, regular dir on xfs >>> erofs: the image.erofs above, on loopback >>> erofs dio: the image.erofs above, on loopback with --direct-io=on >>> ovl: erofs above combined with overlayfs >>> ovl dio: erofs dio above combined with overlayfs >>> cfs: composefs mount of image.cfs >>> >>> All tests use the same objects dir, stored on xfs. The erofs and >>> overlay implementations are from a stock 6.1.13 kernel, and composefs >>> module is from github HEAD. >>> >>> I tried loopback both with and without the direct-io option, because >>> without direct-io enabled the kernel will double-cache the loopbacked >>> data, as per[1]. >>> >>> The produced images are: >>> 8.9M image.cfs >>> 11.3M image.erofs >>> >>> And gives these results: >>> | Cold cache | Warm cache | Mem use >>> | (msec) | (msec) | (mb) >>> -----------+------------+------------+--------- >>> xfs | 1449 | 442 | 54 >>> erofs | 700 | 391 | 45 >>> erofs dio | 939 | 400 | 45 >>> ovl | 1827 | 530 | 130 >>> ovl dio | 2156 | 531 | 130 >>> cfs | 689 | 389 | 51 >> >> It has been noted that the readahead done by kernel_read() may cause >> read-ahead of unrelated data into memory which skews the results in >> favour of workloads that consume all the filesystem metadata (such as >> the ls -lR usecase of the above test). In the table above this favours >> composefs (which uses kernel_read in some codepaths) as well as >> non-dio erofs (non-dio loopback device uses readahead too). >> >> I updated composefs to not use kernel_read here: >> https://github.com/containers/composefs/pull/105 >> >> And a new kernel patch-set based on this is available at: >> https://github.com/alexlarsson/linux/tree/composefs >> >> The resulting table is now (dropping the non-dio erofs): >> >> | Cold cache | Warm cache | Mem use >> | (msec) | (msec) | (mb) >> -----------+------------+------------+--------- >> xfs | 1449 | 442 | 54 >> erofs dio | 939 | 400 | 45 >> ovl dio | 2156 | 531 | 130 >> cfs | 833 | 398 | 51 >> >> | Cold cache | Warm cache | Mem use >> | (msec) | (msec) | (mb) >> -----------+------------+------------+--------- >> ext4 | 1135 | 394 | 54 >> erofs dio | 922 | 401 | 45 >> ovl dio | 1810 | 532 | 149 >> ovl lazy | 1063 | 523 | 87 >> cfs | 768 | 459 | 51 >> >> So, while cfs is somewhat worse now for this particular usecase, my >> overall analysis still stands. >> > > Hi, > > I tested your patch removing kernel_read(), and here is the statistics > tested in my environment. > > > Setup > ====== > CPU: x86_64 Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz > Disk: cloud disk, 11800 IOPS upper limit > OS: Linux v6.2 > FS of backing objects: xfs > > > Image size > =========== > 8.6M large.composefs (with --compute-digest) > 8.9M large.erofs (mkfs.erofs) > 11M large.cps.in.erofs (mkfs.composefs --compute-digest --format=erofs) > > > Perf of "ls -lR" > ================ > | uncached| cached > | (ms) | (ms) > ----------------------------------------------|---------|-------- > composefs | 519 | 178 > erofs (mkfs.erofs, DIRECT loop) | 497 | 192 > erofs (mkfs.composefs --format=erofs, DIRECT loop) | 536 | 199 > > I tested the performance of "ls -lR" on the whole tree of > cs9-developer-rootfs. It seems that the performance of erofs (generated > from mkfs.erofs) is slightly better than that of composefs. While the > performance of erofs generated from mkfs.composefs is slightly worse > that that of composefs. > > The uncached performance is somewhat slightly different with that given > by Alexander Larsson. I think it may be due to different test > environment, as my test machine is a server with robust performance, > with cloud disk as storage. > > It's just a simple test without further analysis, as it's a bit late for > me :) > Forgot to mention that all erofs (no matter generated from mkfs.erofs or mkfs.composefs) are mounted with "-o noacl", as composefs has not implemented its acl yet. -- Thanks, Jingbo