On Mon, Jan 08, 2024 at 10:17:49PM -0500, Matthew Sakai wrote: > > > On 1/8/24 10:52, Matthias Kaehlcke wrote: > > Hi Matthew, > > > > Thanks for your reply! > > > > On Thu, Jan 04, 2024 at 09:07:07PM -0500, Matthew Sakai wrote: > > > > > > > > > On 12/28/23 14:16, Matthias Kaehlcke wrote: > > > > Hi, > > > > > > > > On Fri, Nov 17, 2023 at 03:59:18PM -0500, Matthew Sakai wrote: > > > > > This adds the admin-guide documentation for dm-vdo. > > > > > > > > > > vdo.rst is the guide to using dm-vdo. vdo-design is an overview of the > > > > > design of dm-vdo. > > > > > > > > > > Co-developed-by: J. corwin Coburn <corwin@xxxxxxxxxxxxxx> > > > > > Signed-off-by: J. corwin Coburn <corwin@xxxxxxxxxxxxxx> > > > > > Signed-off-by: Matthew Sakai <msakai@xxxxxxxxxx> > > > > > --- > > > > > .../admin-guide/device-mapper/vdo-design.rst | 415 ++++++++++++++++++ > > > > > .../admin-guide/device-mapper/vdo.rst | 388 ++++++++++++++++ > > > > > 2 files changed, 803 insertions(+) > > > > > create mode 100644 Documentation/admin-guide/device-mapper/vdo-design.rst > > > > > create mode 100644 Documentation/admin-guide/device-mapper/vdo.rst > > > > > > > > > > diff --git a/Documentation/admin-guide/device-mapper/vdo-design.rst b/Documentation/admin-guide/device-mapper/vdo-design.rst > > > > > new file mode 100644 > > > > > index 000000000000..c82d51071c7d > > > > > --- /dev/null > > > > > +++ b/Documentation/admin-guide/device-mapper/vdo-design.rst > > > > > @@ -0,0 +1,415 @@ > > > > > +.. SPDX-License-Identifier: GPL-2.0-only > > > > > + > > > > > +================ > > > > > +Design of dm-vdo > > > > > +================ > > > > > + > > > > > +The dm-vdo (virtual data optimizer) target provides inline deduplication, > > > > > +compression, zero-block elimination, and thin provisioning. A dm-vdo target > > > > > +can be backed by up to 256TB of storage, and can present a logical size of > > > > > +up to 4PB. > > > > > > [snip] > > > > > > > > + block map cache size: > > > > > + The size of the block map cache, as a number of 4096-byte > > > > > + blocks. The minimum and recommended value is 32768 blocks. > > > > > + If the logical thread count is non-zero, the cache size > > > > > + must be at least 4096 blocks per logical thread. > > > > > > > > If I understand correctly the minimum of 32768 blocks results in the 128 MB > > > > metadata cache mentioned in 'Tuning', which allows to access up to 100 GB > > > > of logical space. > > > > > > > > Is there a strict reason for this minimum? I'm evaluating to use vdo on > > > > systems with a relatively small vdo volume (say 4GB) and 'only' 4-8 GB of > > > > RAM. The 128 MB of metadata cache would be a sizeable chunk of that, which > > > > could make the use of vdo infeasible. > > > > > > The short answer is that VDO can often use a smaller cache than the default, > > > but it likely won't help in the way you want it to. > > > > > > > > +Examples: > > > > > + > > > > > +Start a previously-formatted vdo volume with 1 GB logical space and 1 GB > > > > > +physical space, storing to /dev/dm-1 which has more than 1 GB of space. > > > > > + > > > > > +:: > > > > > + > > > > > + dmsetup create vdo0 --table \ > > > > > + "0 2097152 vdo V4 /dev/dm-1 262144 4096 32768 16380" > > > > > > > > IIUC the backing device needs to be previously formatted. The formatting > > > > fails when the size of the backing device is < 5GB: > > > > > > > > vdoformat /dev/loop8 > > > > Minimum required size for VDO volume: 5063921664 bytes > > > > vdoformat: formatVDO failed on '/dev/loop8': VDO Status: Out of space > > > > > > > > That was with 'vdoformat' from https://github.com/dm-vdo/vdo/ > > > > > > > > It would be great if somewhat smaller devices could be supported. > > > > > > VDO was designed to handle the challenge of data deduplication in very large > > > storage pools. It generally is not very useful for very small pools. The > > > first question to ask is whether VDO can actually provide any value in the > > > sort of environment you're using. VDO generally takes the strategy of saving > > > storage space by using extra RAM and CPU cycles. In addition, VDO needs to > > > track a certain amount of metadata, which reduces the amount storage > > > available for actual user data. > > > > > > For vdoformat, the biggest consideration is the deduplication index and > > > other metadata, which are basically a fixed cost of about 3.5GB. In order > > > for VDO to be useful, VDO would have to find enough deduplication to make up > > > for the storage lost to VDO's metadata, so the minimum useful size of a VDO > > > volume is in the 8-12GB range. > > > > > > For the block map cache, decreasing the cache size may increase the > > > frequency of metadata writes, which generally decreases the write throughput > > > of the VDO device. So the tradeoff is between RAM and write speed. > > > > > > Nothing about the generic structure of VDO would prevent us from producing a > > > smaller VDO (and in fact we do for some testing purposes), but in a scenario > > > where you can only expect to save a few gigabytes through deduplication, VDO > > > is generally more expensive than it is worth. > > > > > > If you still think this might be worth pursuing, let me know and we can try > > > to work out a configuration which might suit your goals. > > > > Some more context about my use case: > > > > I'm evaluating the use of VDO for storing a hibernate image, the goal is to > > reduce hibernate resume time by loading less data from potentially slow > > storage. That's why the volume is relatively small. The image is only > > written once per hibernate cycle and generally after the system was idle > > for a longer time, so the lower write throughput due to a smaller cache > > size probably wouldn't be a major concern. The systems might not have huge > > amounts of free disk space, an overhead of ~3.5GB for the deduplication > > index would probably rule out the use of VDO. > > > > In the context of this use case the compression part of VDO seems more > > interesting than the deduplication. In the documentation of VDO I noticed > > a parameter to disable deduplication. With that I wonder if it would be > > feasible/reasonable to add an option to vdoformat to omit the deduplication > > index. > > > > Do you think VDO might be (made) suitable for this scenario or is it > > just not the right tool? > > > > Thanks > > > > Matthias > > The primary reason for VDO is the deduplication capability. You can disable > deduplication on a VDO target, but you would still be paying the overhead > costs of being able to enable it. Certainly I think VDO itself is not the > right tool here. Ok, that's good to know, thanks. > We have considered making a compression-only target, but realistically it > would be a completely separate dm target and not a version of VDO. A > compression-only target could remove all the complication of the > deduplication aspects of VDO, and it could potentially even get better > compression by removing some of the constraints imposed by supporting > deduplication. Conceptually it's not too hard, I think, but we haven't > really done any work developing it so it wouldn't come into being any time > soon. If you thought it would be helpful then we can consider prioritizing > that work. Thanks for the offer to priorize a compression-only target, that migh be very useful! A decision about whether hibernate is a priority for Chrome OS in 2024 is still pending, there should be more clarity within a few weeks. Before that it's probably best not to ask others to do any significant development work related with that topic :) > For the specific use case you described, it sounds like you've got a pretty > good idea of what you need to write already. Have you considered trying to > compress that image before writing it, just using file-level compression or > something similar? Unfortunately that is not a (straightforward) option. We use uswsusp, but for the sake of security the kernel writes the image directly to a raw storage device (a dm-crypt target), so any compression would have to happen in the kernel. > I wonder if being able to load less data from storage is actually a win > once you account for the extra computation you would need to decompress > the image. That's a good point, might be worth some prototyping. m.