Hi Mike: > I appreciate that this work is being done with an eye toward > containerd "community" and standardization > it appears that this format of OCI image storage/use is only > used by Alibaba? > But you'd do well to explain why the userspace solution isn't > acceptable. Yes overlaybd has origins in container community, but this work (kernel modules) does *NOT* actually target at container. Because on-demand lazy loading of container images involves complex interactions with the image registry through HTTP(s) protocol, and possibly with other transport serivces (like HTTP proxy, sock5 proxy, P2P, cache, etc.). This is better implemented in user-space and finally exported to kernel as a virtual block device like TCMU or ublk. The user-space impl of Overlaybd has a very large install base in Alibaba, as well as some other big companies, including another major cloud provider. (We'd better not unveil their names before we get their permissions). And We are pleased with the flexibility in user-space that allows for easy integration to various systems / environments. We implement this kernel module and try to contribute it to upstream because we belive it is useful for device mapper and LVM ecology: (1) dm-overlaybd essentially implements generic redistributable snapshot of an block device. This may enable LVM to push/pull individual snapshots to/from a volume repo globally distributed. (2) dm-overlaybd is highly efficent. Its index performance doesn't degrade with the number of snapshots increasing. In constrast, qcow2 (dm-qcow2) do not support efficient external snapshots. It has O(n) overhead in this case, where n is the number of (backing-file) snapshots. (3) dm-zfile is an efficient generic compressed block device. This allows LVM to support compressed snapshot, in order to save disk space without compromise much performance, and may even improve performance in some cases. > I also have doubts that this solution is _actually_ more performant > than a proper filesystem based solution This proposal is not focused on performance, it's focused on new features to dm and LVM as described above, but I still advice you to run benchmarks and see the results. After all, ext4, xfs and other mature file systems are highly optimized as well. > solution that allows page cache sharing Page cache sharing can be realized with DAX support of the dm targets (and the inner file system), together with virtual pmem device backend. > There is an active discussion about, and active development effort > for, using overlayfs + erofs for container images. I'm reluctant to > merge this DM based container image approach without wider consensus > from other container stakeholders. This proposal intends to help dm and lvm ecology, and is not related to those file systems. It actually supports all kinds of file systems with full capabilities. It is of little use in container, as the user-space implementation is more feasible. And, there is nothing preventing the container stakeholders to continue discussing and developing overlayfs, erofs, composefs, etc. -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel