Hi Mike,
On 5/24/23 1:28 AM, Mike Snitzer wrote:
On Fri, May 19 2023 at 6:27P -0400,
Du Rui <durui@xxxxxxxxxxxxxxxxx> wrote:
OverlayBD is a novel layering block-level image format, which is design
for container, secure container and applicable to virtual machine,
published in USENIX ATC '20
https://www.usenix.org/system/files/atc20-li-huiba.pdf
OverlayBD already has a ContainerD non-core sub-project implementation
in userspace, as an accelerated container image service
https://github.com/containerd/accelerated-container-image
It could be much more efficient when do decompressing and mapping works
in the kernel with the framework of device-mapper, in many circumstances,
such as secure container runtime, mobile-devices, etc.
This patch contains a module, dm-overlaybd, provides two kinds of targets
dm-zfile and dm-lsmt, to expose a group of block-devices contains
OverlayBD image as a overlaid read-only block-device.
Signed-off-by: Du Rui <durui@xxxxxxxxxxxxxxxxx>
<snip, original patch here: [1] >
I appreciate that this work is being done with an eye toward
containerd "community" and standardization but based on my limited
research it appears that this format of OCI image storage/use is only
used by Alibaba? (but I could be wrong...)
But you'd do well to explain why the userspace solution isn't
acceptable. Are there security issues that moving the implementation
to kernel addresses?
I also have doubts that this solution is _actually_ more performant
than a proper filesystem based solution that allows page cache sharing
of container image data across multiple containers.
There is an active discussion about, and active development effort
for, using overlayfs + erofs for container images. I'm reluctant to
merge this DM based container image approach without wider consensus
from other container stakeholders.
But short of reaching wider consensus on the need for these DM
targets: there is nothing preventing you from carrying these changes
in your alibaba kernel.
Mike
[1]: https://patchwork.kernel.org/project/dm-devel/patch/9505927dabc3b6695d62dfe1be371b12f5bdebf7.1684491648.git.durui@xxxxxxxxxxxxxxxxx/
OverlayBD is a generic solution for overlayable and random accessable
read-only block device, it is a part of container image solution, but
not only designed for container images. Actually our team also use it in
VM and other data images.
Container images in format of OverlayBD is not only used in Alibaba, as
a open-source solution of containerd, it has already have users in
community. The project also have contributors from community.
I do like erofs, and also looking forward to widely used container image
solutions via filesystem. But any filesystem container image soultion
has no conflict with a generic block device image.
All filesystems that access data via block-devices are possible to
create OverlayBD image, including those widely used filesystems. With
dm-snapshot or dm-thin providing writable layer for a read-only block
device, block images can be mounted as full featured filesystem, with
100% compatibility to those filesystems on normal block devices.
By my tests, erofs, btrfs, squashfs, and other filesystems on OverlayBD
performs very well, in some certain circumstances, even better that
those on raw block devices.
Considering sharing page cache, lots of filesystem supports DAX for PMEM
devices, that might be a way to work around I think. Currently those
related implementation is not a part of this module.
Thanks for the replying.
Du Rui
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel