[announce] thin-provisioning-tools v1.0.0-rc1

Joe Thornber <thornber@xxxxxxxxxx> · Mon, 12 Dec 2022 14:57:05 +0000

We're pleased to announce the first release candidate of v1.0.0 of the
thin-provisioning-tools (which also contains tools for dm-cache and
dm-era).

Please try it out on your test systems and give us feedback.  In
particular regarding documentation, build and install process.

    https://github.com/jthornber/thin-provisioning-tools

# Rust

This is a complete rewrite of the tools in the Rust language.  This switch
was made for three primary reasons:

- Memory safety.
- The type system makes it easier to reason about multithreaded code and we need
  to use multiple threads to boost performance.
- Rust is a more expressive language than C++ (eg, proper algebraic data types).

# IO engines

The old C++ tools used libaio for all IO.  The Rust version by default
uses traditional synchronous IO.  This sounds like a step backwards,
but the use of multiple threads and IO scheduling means there's a big
leap in performance.

In addition there's a compile time option to include asynchronous
IO support via io_uring.  This engine is slightly faster, but not all
distributions support io_uring at the moment.  In addition we've seen
recent (summer 2022) kernel versions that lose io notifications, causing
us to feel that io_uring itself isn't quite ready.

# Performance

We've focussed on thin_check and cache_check performance most of all
since these regularly get run on system startup.  But all tools should
have made significant performance improvements.

Over the years we've collected some gnarly dm-thin metadata examples from
users, eg, using hundreds of thousands of snapshots, and completely
filling the maximum metadata size of 16G.  These are great for
benchmarking, for example running thin_check on my system with one of these files:

    thin_check (v0.9):                  6m
    thin_check (v1.0, sync engine):     28s  (12.9 times faster)
    thin_check (v1.0, io_uring engine): 23s  (15.6 times faster)

# thin_dump/restore retains snapshot sharing 

Another issue with previous versions of the tools is dumping and restoring
thin metadata would have the effect of losing sharing of the metadata
btrees for snapshots.  Meaning restored metadata often took up more space, and
in some cases would exceed the 16G metadata limit.  This is no longer the case.

[note: _data_ sharing was always maintained, this is purely about metadata space usage]

# thin_metadata_pack/unpack

These are a couple of new tools that are used for support.  They compress
thin metadata, typically to a tenth of the size (much better than you'd
get with generic compressors).  This makes it easier to pass damaged
metadata around for inspection.

# blk-archive

The blk-archive tools were initially part of this thin-provisioning-tools
package.  But have now been split off to their own project:

    https://github.com/jthornber/blk-archive

They allow efficient archiving of thin devices (data deduplication
and compression).  Which will be of interest to those of you who are
holding large numbers of snapshots in thin pools as a poor man's backup.

In particular:

    - Thin snapshots can be used to archive live data.
    - it avoids reading unprovisioned areas of thin devices.
    - it can calculate deltas between thin devices to minimise
      how much data is read and deduped (incremental backups).
    - restoring to a thin device tries to maximise data sharing
      within the thin pool (a big win if you're restoring snapshots).

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel