Re: [PATCH v5 01/40] dm: add documentation for dm-vdo target

Matthew Sakai <msakai@xxxxxxxxxx> · Mon, 8 Jan 2024 22:17:49 -0500

On 1/8/24 10:52, Matthias Kaehlcke wrote:
Hi Matthew,

Thanks for your reply!

On Thu, Jan 04, 2024 at 09:07:07PM -0500, Matthew Sakai wrote:


On 12/28/23 14:16, Matthias Kaehlcke wrote:
Hi,

On Fri, Nov 17, 2023 at 03:59:18PM -0500, Matthew Sakai wrote:
This adds the admin-guide documentation for dm-vdo.

vdo.rst is the guide to using dm-vdo. vdo-design is an overview of the
design of dm-vdo.

Co-developed-by: J. corwin Coburn <corwin@xxxxxxxxxxxxxx>
Signed-off-by: J. corwin Coburn <corwin@xxxxxxxxxxxxxx>
Signed-off-by: Matthew Sakai <msakai@xxxxxxxxxx>
---
   .../admin-guide/device-mapper/vdo-design.rst  | 415 ++++++++++++++++++
   .../admin-guide/device-mapper/vdo.rst         | 388 ++++++++++++++++
   2 files changed, 803 insertions(+)
   create mode 100644 Documentation/admin-guide/device-mapper/vdo-design.rst
   create mode 100644 Documentation/admin-guide/device-mapper/vdo.rst

diff --git a/Documentation/admin-guide/device-mapper/vdo-design.rst b/Documentation/admin-guide/device-mapper/vdo-design.rst
new file mode 100644
index 000000000000..c82d51071c7d
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/vdo-design.rst
@@ -0,0 +1,415 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+================
+Design of dm-vdo
+================
+
+The dm-vdo (virtual data optimizer) target provides inline deduplication,
+compression, zero-block elimination, and thin provisioning. A dm-vdo target
+can be backed by up to 256TB of storage, and can present a logical size of
+up to 4PB.

[snip]

+	block map cache size:
+		The size of the block map cache, as a number of 4096-byte
+		blocks. The minimum and recommended value is 32768 blocks.
+		If the logical thread count is non-zero, the cache size
+		must be at least 4096 blocks per logical thread.

If I understand correctly the minimum of 32768 blocks results in the 128 MB
metadata cache mentioned in 'Tuning', which allows to access up to 100 GB
of logical space.

Is there a strict reason for this minimum? I'm evaluating to use vdo on
systems with a relatively small vdo volume (say 4GB) and 'only' 4-8 GB of
RAM. The 128 MB of metadata cache would be a sizeable chunk of that, which
could make the use of vdo infeasible.

The short answer is that VDO can often use a smaller cache than the default,
but it likely won't help in the way you want it to.

+Examples:
+
+Start a previously-formatted vdo volume with 1 GB logical space and 1 GB
+physical space, storing to /dev/dm-1 which has more than 1 GB of space.
+
+::
+
+	dmsetup create vdo0 --table \
+	"0 2097152 vdo V4 /dev/dm-1 262144 4096 32768 16380"

IIUC the backing device needs to be previously formatted. The formatting
fails when the size of the backing device is < 5GB:

vdoformat /dev/loop8
    Minimum required size for VDO volume: 5063921664 bytes
    vdoformat: formatVDO failed on '/dev/loop8': VDO Status: Out of space

That was with 'vdoformat' from https://github.com/dm-vdo/vdo/

It would be great if somewhat smaller devices could be supported.

VDO was designed to handle the challenge of data deduplication in very large
storage pools. It generally is not very useful for very small pools. The
first question to ask is whether VDO can actually provide any value in the
sort of environment you're using. VDO generally takes the strategy of saving
storage space by using extra RAM and CPU cycles. In addition, VDO needs to
track a certain amount of metadata, which reduces the amount storage
available for actual user data.

For vdoformat, the biggest consideration is the deduplication index and
other metadata, which are basically a fixed cost of about 3.5GB. In order
for VDO to be useful, VDO would have to find enough deduplication to make up
for the storage lost to VDO's metadata, so the minimum useful size of a VDO
volume is in the 8-12GB range.

For the block map cache, decreasing the cache size may increase the
frequency of metadata writes, which generally decreases the write throughput
of the VDO device. So the tradeoff is between RAM and write speed.

Nothing about the generic structure of VDO would prevent us from producing a
smaller VDO (and in fact we do for some testing purposes), but in a scenario
where you can only expect to save a few gigabytes through deduplication, VDO
is generally more expensive than it is worth.

If you still think this might be worth pursuing, let me know and we can try
to work out a configuration which might suit your goals.

Some more context about my use case:

I'm evaluating the use of VDO for storing a hibernate image, the goal is to
reduce hibernate resume time by loading less data from potentially slow
storage. That's why the volume is relatively small. The image is only
written once per hibernate cycle and generally after the system was idle
for a longer time, so the lower write throughput due to a smaller cache
size probably wouldn't be a major concern. The systems might not have huge
amounts of free disk space, an overhead of ~3.5GB for the deduplication
index would probably rule out the use of VDO.

In the context of this use case the compression part of VDO seems more
interesting than the deduplication. In the documentation of VDO I noticed
a parameter to disable deduplication. With that I wonder if it would be
feasible/reasonable to add an option to vdoformat to omit the deduplication
index.

Do you think VDO might be (made) suitable for this scenario or is it
just not the right tool?

Thanks

Matthias

The primary reason for VDO is the deduplication capability.  You can 
disable deduplication on a VDO target, but you would still be paying the 
overhead costs of being able to enable it. Certainly I think VDO itself 
is not the right tool here.

We have considered making a compression-only target, but realistically 
it would be a completely separate dm target and not a version of VDO. A 
compression-only target could remove all the complication of the 
deduplication aspects of VDO, and it could potentially even get better 
compression by removing some of the constraints imposed by supporting 
deduplication. Conceptually it's not too hard, I think, but we haven't 
really done any work developing it so it wouldn't come into being any 
time soon. If you thought it would be helpful then we can consider 
prioritizing that work.

For the specific use case you described, it sounds like you've got a 
pretty good idea of what you need to write already. Have you considered 
trying to compress that image before writing it, just using file-level 
compression or something similar? I wonder if being able to load less 
data from storage is actually a win once you account for the extra 
computation you would need to decompress the image.

Matt