On Thu, 2 Mar 2023, Joe Thornber wrote: > Hi Eric, > > On Wed, Mar 1, 2023 at 10:26 PM Eric Wheeler <dm-devel@xxxxxxxxxxxxxxxxxx> wrote: > > Hurrah! I've been looking forward to this for a long time... > > > ...So if you have any commentary on the future of dm-thin with respect > to metadata range support, or dm-thin performance in general, that I would > be very curious about your roadmap and your plans. > > > The plan over the next few months is roughly: > > - Get people using the new Rust tools. They are _so_ much faster than > the old C++ ones. [available now] > - Push upstream a set of patches I've been working on to boost thin > concurrency performance. These are nearing completion and are > available here for those who are interested: > https://github.com/jthornber/linux/tree/2023-02-28-thin-concurrency-7. > These are making a huge difference to performance in my testing, eg, > fio with 16 jobs running concurrently gets several times the throughput. > [Upstream in the next month hopefully] It would be nice to get people testing the new improvements: Do you think it can make it for the 6.3 merge window that is open? > - Change thinp metadata to store ranges rather than individual mappings. > This will reduce the amount of space the metadata consumes, and have > the knock on effect of boosting performance slightly (less metadata > means faster lookups). However I consider this a half-way house, in > that I'm only going to change the metadata and not start using ranges > within the core target (I'm not moving away from fixed block sizes). > [Next 3 months] Good idea. > I don't envisage significant changes to dm-thin or dm-cache after this. Seems reasonable. > Longer term I think we're nearing a crunch point where we drastically > change how we do things. Since I wrote device-mapper in 2001 the speed > of devices has increased so much that I think dm is no longer doing a > good job: > > - The layering approach introduces inefficiencies with each layer. > Sure it may only be a 5% hit to add another linear mapping into the > stack. But those 5%'s add up. > - dm targets only see individual bios rather than the whole request > queue. This prevents a lot of really useful optimisations. Think how > much smarter dm-cache and dm-thin could be if they could look at the > whole queue. > - The targets are getting too complicated. I think dm-thin is around 8k > lines of code, though it shares most of that with dm-cache. I > understand the dedup target from the vdo guys weighs in at 64k lines. > Kernel development is fantastically expensive (or slow depending > how you want to look at it). I did a lot of development work on > thinp v2, and it was looking a lot like a filesystem shoe-horned into > the block layer. I can see why bcache turned into bcache-fs. Did thinp v2 get dropped, or just turn into the patchset above? > - Code within the block layer is memory constrained. We can't allocate > arbitrary sized allocations within targets, instead we have to use > mempools of fixed size objects (frowned upon these days), or declare > up front how much memory we need to service a bio (forcing us to > assume the worst case). > This stuff isn't hard, just tedious and makes coding sophisticated targets pretty joyless. > > So my plan going forwards is to keep the fast path of these targets in > kernel (eg, a write to a provisioned, unsnapshotted region). But take > the slow paths out to userland. Seems reasonable. > I think io_uring, and ublk have shown us that this is viable. That way > a snapshot copy-on-write, or dm-cache data migration, which are very > slow operations can be done with ordinary userland code. Would be nice to minimize CoW latency somehow if going to userspace increases that a notable amount. CoW for spinning disks is definitely slow, but NVMe's are pretty quick to copy a 64k chunk. > For the fast paths, layering will be removed by having userland give > the kernel instruction to execute for specific regions of the virtual > device (ie. remap to here). Maybe you just answered my question of latency? > The kernel driver will have nothing specific to thin/cache etc. I'm not > sure how many of the current dm-targets would fit into this model, but > I'm sure thin provisioning, caching, linear, and stripe can. To be clear, linear and stripe would stay in the kernel? -Eric > > - Joe > > > > > > > > Thanks again for all your great work on this. > > -Eric > > > [note: _data_ sharing was always maintained, this is purely about metadata space usage] > > > > # thin_metadata_pack/unpack > > > > These are a couple of new tools that are used for support. They compress > > thin metadata, typically to a tenth of the size (much better than you'd > > get with generic compressors). This makes it easier to pass damaged > > metadata around for inspection. > > > > # blk-archive > > > > The blk-archive tools were initially part of this thin-provisioning-tools > > package. But have now been split off to their own project: > > > > https://github.com/jthornber/blk-archive > > > > They allow efficient archiving of thin devices (data deduplication > > and compression). Which will be of interest to those of you who are > > holding large numbers of snapshots in thin pools as a poor man's backup. > > > > In particular: > > > > - Thin snapshots can be used to archive live data. > > - it avoids reading unprovisioned areas of thin devices. > > - it can calculate deltas between thin devices to minimise > > how much data is read and deduped (incremental backups). > > - restoring to a thin device tries to maximise data sharing > > within the thin pool (a big win if you're restoring snapshots). > > > > > > >
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel