For quite a while, I have wanted to write a tool to manage thin volumes that is not based on LVM. The main thing holding me back is that the current dm-thin interface is extremely error-prone. The only per-thin metadata stored by the kernel is a 24-bit thin ID, and userspace must take great care to keep that ID in sync with its own metadata. Failure to do so results in data loss, data corruption, or even security vulnerabilities. Furthermore, having to suspend a thin volume before one can take a snapshot of it creates a critical section during which userspace must be very careful, as I/O or a crash can lead to deadlock. I believe both of these problems can be solved without overly complicating the kernel implementation. The metadata problem can be solved by allowing userspace to (1) associate a 256-byte binary blob with each thin volume and (2) easily enumerate the thin volumes in a pool. Even with 16777216 thins, this would only use 4GiB of space, and dm-thin v2 will support far larger metadata volumes. While being able to look up thins by the blob would be awesome, I would be okay with just enumerating thins at startup and caching the ID ⇔ blob mapping in userspace, at least if thin IDs become 64-bit so I do not have to worry about reuse. Being able to enumerate the thin volumes would allow me to rely solely on the metadata in the thin pool, without having to manage any metadata in userspace. Looking at the existing implementation, this seems to be fairly simple: the current B-tree code supports arbitrary value sizes already, so the blob could be appended to 'struct disk_device_details'. (Requiring the size of the blob to be set at pool creation, or when the pool is empty, is fine.) The suspend problem can be solved by having the kernel automatically suspend a thin volume before taking a snapshot of it, and resuming afterwards. This removes a footgun from the userspace API, and should improve reliability too, as it reduces the number of error conditions that can hang the system. Per discussion with Zdenek, having the kernel do this automatically is infeasible for arbitrary device stacks, but this is a common special case. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab
Attachment:
signature.asc
Description: PGP signature
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel