(+cc Jingbo Xu and Christian Brauner) On 2023/2/27 17:22, Alexander Larsson wrote:
Hello, Recently Giuseppe Scrivano and I have worked on[1] and proposed[2] the Composefs filesystem. It is an opportunistically sharing, validating image-based filesystem, targeting usecases like validated ostree rootfs:es, validated container images that share common files, as well as other image based usecases. During the discussions in the composefs proposal (as seen on LWN[3]) is has been proposed that (with some changes to overlayfs), similar behaviour can be achieved by combining the overlayfs "overlay.redirect" xattr with an read-only filesystem such as erofs. There are pros and cons to both these approaches, and the discussion about their respective value has sometimes been heated. We would like to have an in-person discussion at the summit, ideally also involving more of the filesystem development community, so that we can reach some consensus on what is the best apporach. Good participants would be at least: Alexander Larsson, Giuseppe Scrivano, Amir Goldstein, David Chinner, Gao Xiang, Miklos Szeredi, Jingbo Xu
I'd be happy to discuss this at LSF/MM/BPF this year. Also we've addressed the root cause of the performance gap is that composefs read some data symlink-like payload data by using cfs_read_vdata_path() which involves kernel_read() and trigger heuristic readahead of dir data (which is also landed in composefs vdata area together with payload), so that most composefs dir I/O is already done in advance by heuristic readahead. And we think almost all exist in-kernel local fses doesn't have such heuristic readahead and if we add the similar stuff, EROFS could do better than composefs. Also we've tried random stat()s about 500~1000 files in the tree you shared (rather than just "ls -lR") and EROFS did almost the same or better than composefs. I guess further analysis (including blktrace) could be shown by Jingbo later. Not sure if Christian Brauner would like to discuss this new stacked fs with on-disk metadata as well (especially about userns stuff since it's somewhat a plan in the composefs roadmap as well.) Thanks, Gao Xiang
[1] https://github.com/containers/composefs [2] https://lore.kernel.org/lkml/cover.1674227308.git.alexl@xxxxxxxxxx/ [3] https://lwn.net/SubscriberLink/922851/45ed93154f336f73/