Re: [LSF/MM/BFP TOPIC] Composefs vs erofs+overlay

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 1, 2023 at 4:47 AM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote:
>
> Hi all,
>
> On 2/27/23 6:45 PM, Gao Xiang wrote:
> >
> > (+cc Jingbo Xu and Christian Brauner)
> >
> > On 2023/2/27 17:22, Alexander Larsson wrote:
> >> Hello,
> >>
> >> Recently Giuseppe Scrivano and I have worked on[1] and proposed[2] the
> >> Composefs filesystem. It is an opportunistically sharing, validating
> >> image-based filesystem, targeting usecases like validated ostree
> >> rootfs:es, validated container images that share common files, as well
> >> as other image based usecases.
> >>
> >> During the discussions in the composefs proposal (as seen on LWN[3])
> >> is has been proposed that (with some changes to overlayfs), similar
> >> behaviour can be achieved by combining the overlayfs
> >> "overlay.redirect" xattr with an read-only filesystem such as erofs.
> >>
> >> There are pros and cons to both these approaches, and the discussion
> >> about their respective value has sometimes been heated. We would like
> >> to have an in-person discussion at the summit, ideally also involving
> >> more of the filesystem development community, so that we can reach
> >> some consensus on what is the best apporach.
> >>
> >> Good participants would be at least: Alexander Larsson, Giuseppe
> >> Scrivano, Amir Goldstein, David Chinner, Gao Xiang, Miklos Szeredi,
> >> Jingbo Xu
> > I'd be happy to discuss this at LSF/MM/BPF this year. Also we've addressed
> > the root cause of the performance gap is that
> >
> > composefs read some data symlink-like payload data by using
> > cfs_read_vdata_path() which involves kernel_read() and trigger heuristic
> > readahead of dir data (which is also landed in composefs vdata area
> > together with payload), so that most composefs dir I/O is already done
> > in advance by heuristic  readahead.  And we think almost all exist
> > in-kernel local fses doesn't have such heuristic readahead and if we add
> > the similar stuff, EROFS could do better than composefs.
> >
> > Also we've tried random stat()s about 500~1000 files in the tree you shared
> > (rather than just "ls -lR") and EROFS did almost the same or better than
> > composefs.  I guess further analysis (including blktrace) could be shown by
> > Jingbo later.
> >
>
> The link path string and dirents are mix stored in a so-called vdata
> (variable data) section[1] in composefs, sometimes even in the same
> block (figured out by dumping the composefs image).  When doing lookup,
> composefs will resolve the link path.  It will read the link path string
> from vdata section through kernel_read(), along which those dirents in
> the following blocks are also read in by the heuristic readahead
> algorithm in kernel_read().  I believe this will much benefit the
> performance in the workload like "ls -lR".

This is interesting stuff, and honestly I'm a bit surprised other
filesystems don't try to readahead directory metadata to some degree
too. It seems inherent to all filesystems that they try to pack
related metadata near each other, so readahead would probably be
useful even for read-write filesystems, although even more so for
read-only filesystems (due to lack of fragmentation).

But anyway, this is sort of beside the current issue. There is nothing
inherent in composefs that makes it have to do readahead like this,
and correspondingly, if it is a good idea to do it, erofs could do it
too,

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                Red Hat, Inc
       alexl@xxxxxxxxxx         alexander.larsson@xxxxxxxxx





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux