> >>> Apart from that, I still fail to get some thoughts (apart from > >>> unprivileged > >>> mounts) how EROFS + overlayfs combination fails on automative real > >>> workloads > >>> aside from "ls -lR" (readdir + stat). > >>> > >>> And eventually we still need overlayfs for most use cases to do > >>> writable > >>> stuffs, anyway, it needs some words to describe why such < 1s > >>> difference is > >>> very very important to the real workload as you already mentioned > >>> before. > >>> > >>> And with overlayfs lazy lookup, I think it can be close to ~100ms or > >>> better. > >>> > >> > >> If we had an overlay.fs-verity xattr, then I think there are no > >> individual features lacking for it to work for the automotive usecase > >> I'm working on. Nor for the OCI container usecase. However, the > >> possibility of doing something doesn't mean it is the better technical > >> solution. > >> > >> The container usecase is very important in real world Linux use today, > >> and as such it makes sense to have a technically excellent solution for > >> it, not just a workable solution. Obviously we all have different > >> viewpoints of what that is, but these are the reasons why I think a > >> composefs solution is better: > >> > >> * It is faster than all other approaches for the one thing it actually > >> needs to do (lookup and readdir performance). Other kinds of > >> performance (file i/o speed, etc) is up to the backing filesystem > >> anyway. > >> > >> Even if there are possible approaches to make overlayfs perform better > >> here (the "lazy lookup" idea) it will not reach the performance of > >> composefs, while further complicating the overlayfs codebase. (btw, did > >> someone ask Miklos what he thinks of that idea?) > >> > > > > Well, Miklos was CCed (now in TO:) > > I did ask him specifically about relaxing -ouserxarr,metacopy,redirect: > > https://lore.kernel.org/linux-unionfs/20230126082228.rweg75ztaexykejv@wittgenstein/T/#mc375df4c74c0d41aa1a2251c97509c6522487f96 > > but no response on that yet. > > > > TBH, in the end, Miklos really is the one who is going to have the most > > weight on the outcome. > > > > If Miklos is interested in adding this functionality to overlayfs, you are going > > to have a VERY hard sell, trying to merge composefs as an independent > > expert filesystem. The community simply does not approve of this sort of > > fragmentation unless there is a very good reason to do that. > > > >> For the automotive usecase we have strict cold-boot time requirements > >> that make cold-cache performance very important to us. Of course, there > >> is no simple time requirements for the specific case of listing files > >> in an image, but any improvement in cold-cache performance for both the > >> ostree rootfs and the containers started during boot will be worth its > >> weight in gold trying to reach these hard KPIs. > >> > >> * It uses less memory, as we don't need the extra inodes that comes > >> with the overlayfs mount. (See profiling data in giuseppes mail[1]). > > > > Understood, but we will need profiling data with the optimized ovl > > (or with the single blob hack) to compare the relevant alternatives. > > My little request again, could you help benchmark on your real workload > rather than "ls -lR" stuff? If your hard KPI is really what as you > said, why not just benchmark the real workload now and write a detailed > analysis to everyone to explain it's a _must_ that we should upstream > a new stacked fs for this? > I agree that benchmarking the actual KPI (boot time) will have a much stronger impact and help to build a much stronger case for composefs if you can prove that the boot time difference really matters. In order to test boot time on fair grounds, I prepared for you a POC branch with overlayfs lazy lookup: https://github.com/amir73il/linux/commits/ovl-lazy-lowerdata It is very lightly tested, but should be sufficient for the benchmark. Note that: 1. You need to opt-in with redirect_dir=lazyfollow,metacopy=on 2. The lazyfollow POC only works with read-only overlay that has two lower dirs (1 metadata layer and one data blobs layer) 3. The data layer must be a local blockdev fs (i.e. not a network fs) 4. Only absolute path redirects are lazy (e.g. "/objects/cc/3da...") These limitations could be easily lifted with a bit more work. If any of those limitations stand in your way for running the benchmark let me know and I'll see what I can do. If there is any issue with the POC branch, please let me know. Thanks, Amir.