Re: [PATCH RFC 0/9] qemu: Support mapped-ram migration capability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/7/24 09:49, Daniel P. Berrangé wrote:
On Wed, Aug 07, 2024 at 02:32:57PM +0200, Martin Kletzander wrote:
On Thu, Jun 13, 2024 at 04:43:14PM -0600, Jim Fehlig via Devel wrote:
This series is a RFC for support of QEMU's mapped-ram migration
capability [1] for saving and restoring VMs. It implements the first
part of the design approach we discussed for supporting parallel
save/restore [2]. In summary, the approach is

1. Add mapped-ram migration capability
2. Steal an element from save header 'unused' for a 'features' variable
   and bump save version to 3.
3. Add /etc/libvirt/qemu.conf knob for the save format version,
   defaulting to latest v3
4. Use v3 (aka mapped-ram) by default
5. Use mapped-ram with BYPASS_CACHE for v3, old approach for v2
6. include: Define constants for parallel save/restore
7. qemu: Add support for parallel save. Implies mapped-ram, reject if v2
8. qemu: Add support for parallel restore. Implies mapped-ram.
   Reject if v2
9. tools: add parallel parameter to virsh save command
10. tools: add parallel parameter to virsh restore command

This series implements 1-5, with the BYPASS_CACHE support in patches 8
and 9 being quite hacky. They are included to discuss approaches to make
them less hacky. See the patches for details.


They might seem tiny bit hacky, but it's not that big of a deal I think.

You could eliminate two conditions by making the first FD always
non-direct (as in either there is no BYPASS_CACHE or it's already
wrapped by the I/O helper), but it would complicate other things in the
code and would get even more hairy IMHO.

The QEMU mapped-ram capability currently does not support directio.
Fabino is working on that now [3]. This complicates merging support
in libvirt. I don't think it's reasonable to enable mapped-ram by
default when BYPASS_CACHE cannot be supported. Should we wait until
the mapped-ram directio support is merged in QEMU before supporting
mapped-ram in libvirt?


By the time I looked at this series the direct-io work has already went
in, but there is still the need for the second descriptor to do some
unaligned I/O.

 From the QEMU patches I'm not sure whether you also need to set the
direct-io migration capability/flag when migrating to an fdset.  Maybe
that's needed for migration into a file directly.

For the moment, compression is ignored in the new save version.
Currently, libvirt connects the output of QEMU's save stream to the
specified compression program via a pipe. This approach is incompatible
with mapped-ram since the fd provided to QEMU must be seekable. One
option is to reopen and compress the saved image after the actual save
operation has completed. This has the downside of requiring the iohelper
to handle BYPASS_CACHE, which would preclude us from removing it
sometime in the future. Other suggestions much welcomed.


I was wondering whether it would make sense to use user-space block I/O,
but we'd have to use some compression on a block-by-block basis and
since you need to be able to compress each write separately, that means
you might just save few bytes here and there.  And on top of that you'd
have to compress each individual block and that block needs to be
allocated as a whole, so no space would be saved at all.  So that does
not make sense unless there is some new format.

And compression after the save is finished is in my opinion kind of
pointless.  You don't save time and you only save disk space _after_ the
compression step is done.  Not to mention you'd have to uncompress it
again before starting QEMU from it.  I'd be fine with making users
choose between compression and mapped-ram, at least for now.  They can
compress the resulting file on their own.

That argument for compressing on their own applies to the existing
code too. The reason we want it in libvirt is that we make it
'just work' without requiring apps to have a login shell to the
hypervisor to run commands out of band.

So basically it depends whether disk space is more important than
overall wallclock time. It might still be worthwhile if the use of
multifd with mapped-ram massively reduces the overall save duration
and we also had a parallelized compression tool we were spawning.
eg xz can be told to use all CPU thrfads.

mapped-ram+direct-io+multifd is quite an improvement over the current save/restore mechanism. The following table shows some save/restore stats from a guest with 32G RAM, 30G dirty, 1 vcpu in tight loop dirtying memory.

                       | save    | restore |
                       | time    | time    | Size         | Blocks
-----------------------+---------+---------+--------------+---------
legacy                 | 25.800s | 14.332s | 33154309983  | 64754512
-----------------------+---------+---------+--------------+---------
mapped-ram             | 18.742s | 15.027s | 34368559228  | 64617160
-----------------------+---------+---------+--------------+---------
legacy + direct IO     | 13.115s | 18.050s | 33154310496  | 64754520
-----------------------+---------+---------+--------------+---------
mapped-ram + direct IO | 13.623s | 15.959s | 34368557392  | 64662040
-----------------------+-------- +---------+--------------+---------
mapped-ram + direct IO |         |         |              |
 + multifd-channels=8  | 6.994s  | 6.470s  | 34368554980  | 64665776
--------------------------------------------------------------------

In all cases, the save and restore operations are to/from a block device comprised of two NVMe disks in RAID0 configuration with xfs (~8600MiB/s). The values in the 'save time' and 'restore time' columns were scraped from the 'real' time reported by time(1). The 'Size' and 'Blocks' columns were provided by the corresponding outputs of stat(1).

Regards,
Jim




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux