Re: [announce] thin-provisioning-tools v1.0.0-rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Eric,

On Wed, Mar 1, 2023 at 10:26 PM Eric Wheeler <dm-devel@xxxxxxxxxxxxxxxxxx> wrote:

Hurrah! I've been looking forward to this for a long time...


...So if you have any commentary on the future of dm-thin with respect
to metadata range support, or dm-thin performance in general, that I would
be very curious about your roadmap and your plans.

The plan over the next few months is roughly:

- Get people using the new Rust tools.  They are _so_ much faster than the old C++ ones. [available now]
- Push upstream a set of patches I've been working on to boost thin concurrency performance.  These are 
  nearing completion and are available here for those who are interested: https://github.com/jthornber/linux/tree/2023-02-28-thin-concurrency-7.
  These are making a huge difference to performance in my testing, eg, fio with 16 jobs running concurrently gets several times the throughput.
  [Upstream in the next month hopefully]
- Change thinp metadata to store ranges rather than individual mappings.  This will reduce the amount of space the metadata consumes, and 
  have the knock on effect of boosting performance slightly (less metadata means faster lookups).  However I consider this a half-way house, in
  that I'm only going to change the metadata and not start using ranges within the core target (I'm not moving away from fixed block sizes).  [Next 3 months]

I don't envisage significant changes to dm-thin or dm-cache after this.


Longer term I think we're nearing a crunch point where we drastically change how we do things.  Since I wrote device-mapper in 2001 the speed of
devices has increased so much that I think dm is no longer doing a good job:

- The layering approach introduces inefficiencies with each layer.  Sure it may only be a 5% hit to add another linear mapping into the stack.
  But those 5%'s add up.
- dm targets only see individual bios rather than the whole request queue.  This prevents a lot of really useful optimisations.
  Think how much smarter dm-cache and dm-thin could be if they could look at the whole queue.
- The targets are getting too complicated.  I think dm-thin is around 8k lines of code, though it shares most of that with dm-cache.
   I understand the dedup target from the vdo guys weighs in at 64k lines.  Kernel development is fantastically expensive (or slow depending
   how you want to look at it).  I did a lot of development work on thinp v2, and it was looking a lot like a filesystem shoe-horned into the block layer.
   I can see why bcache turned into bcache-fs.
- Code within the block layer is memory constrained.  We can't allocate arbitrary sized allocations within targets, instead we have to use mempools
  of fixed size objects (frowned upon these days), or declare up front how much memory we need to service a bio (forcing us to assume the worst case).
  This stuff isn't hard, just tedious and makes coding sophisticated targets pretty joyless.

So my plan going forwards is to keep the fast path of these targets in kernel (eg, a write to a provisioned, unsnapshotted region).  But take
the slow paths out to userland.  I think io_uring, and ublk have shown us that this is viable.  That way a snapshot copy-on-write, or dm-cache data
migration, which are very slow operations can be done with ordinary userland code.  For the fast paths, layering will be removed by having userland give the kernel
instruction to execute for specific regions of the virtual device (ie. remap to here).  The kernel driver will have nothing specific to thin/cache etc.
I'm not sure how many of the current dm-targets would fit into this model, but I'm sure thin provisioning, caching, linear, and stripe can.

- Joe






 
Thanks again for all your great work on this.

-Eric

> [note: _data_ sharing was always maintained, this is purely about metadata space usage]
>
> # thin_metadata_pack/unpack
>
> These are a couple of new tools that are used for support.  They compress
> thin metadata, typically to a tenth of the size (much better than you'd
> get with generic compressors).  This makes it easier to pass damaged
> metadata around for inspection.
>
> # blk-archive
>
> The blk-archive tools were initially part of this thin-provisioning-tools
> package.  But have now been split off to their own project:
>
>     https://github.com/jthornber/blk-archive
>
> They allow efficient archiving of thin devices (data deduplication
> and compression).  Which will be of interest to those of you who are
> holding large numbers of snapshots in thin pools as a poor man's backup.
>
> In particular:
>
>     - Thin snapshots can be used to archive live data.
>     - it avoids reading unprovisioned areas of thin devices.
>     - it can calculate deltas between thin devices to minimise
>       how much data is read and deduped (incremental backups).
>     - restoring to a thin device tries to maximise data sharing
>       within the thin pool (a big win if you're restoring snapshots).
>
>
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel

[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux