Re: [RFC PATCH] Add new migration flag VIR_MIGRATE_DRY_RUN

Jim Fehlig <jfehlig@xxxxxxxx> · Wed, 14 Nov 2018 11:28:27 -0700

On 11/13/18 3:29 AM, Daniel P. Berrangé wrote:
On Mon, Nov 12, 2018 at 11:33:04AM -0700, Jim Fehlig wrote:
On 11/12/18 4:26 AM, Daniel P. Berrangé wrote:
On Fri, Nov 02, 2018 at 04:34:02PM -0600, Jim Fehlig wrote:
A dry run can be used as a best-effort check that a migration command
will succeed. The destination host will be checked to see if it can
accommodate the resources required by the domain. DRY_RUN will fail if
the destination host is not capable of running the domain. Although a
subsequent migration will likely succeed, the success of DRY_RUN does not
ensure a future migration will succeed. Resources on the destination host
could become unavailable between a DRY_RUN and actual migration.

I'm not really convinced this is a particularly useful concept,
as it is only going to catch a very small number of the reasons
why migration can fail. So you still have to expect the real
migration invokation to have a strong chance of failing.

I agree it is difficult to reliably check that a migration will succeed.
TBH, I was expecting opposition due to libvirt already providing info for
applications to do the check themselves. E.g. as nova has done with
check_can_live_migrate_{source,destination} APIs.

Do you think libvirt provides enough information for an app to determine if
a VM can be migrated between two hosts? Or maybe better asked: What info is
currently missing for an app to reliably check if a VM can be migrated
between two hosts?

There's probably two classes of problem here

  - Things that would prevent the QEMU process being started.

    * XML points to host resources that don't exist (block devices,
      files, nics, host devs, etc, NUMA/CPU pinning)

    * Use of QEMU features that aren't supported by this QEMU version

    * Insufficient free resources. Principally lack of RAM,
      both normal and huge pages.

    These problems are not really anthing todo with live migration
    as they impact normal guest startup to exactly the same degree.

    Libvirt will already report on the first two problems during
    its normal QEMU setup process. During live migration you'll
    see these problems reported quite quickly in the prepare phase
    before any data is sent.

Right. These are the ones that would be easy to detect with dry run, which I 
envisioned would terminate after the prepare phase.

    Insufficient resources is really hard to report on with any
    useful accuracy. We can't even predict reliably how much RAM
    any given QEMU config will need, let alone measure whether
    the host is able to provide that much. If you're lucky QEMU
    may simply fail to start due to insufficient RAM/huge pages.
    This would abort the live migration early on before much data
    is sent.

Inability to predict qemu memory overhead is indeed unfortunate. E.g. SEV 
encrypted VMs must (at the moment) have all their memory regions locked: guest 
RAM, ROM(s), pflash, video RAM, and any qemu overhead. The last one is an 
"undecidable problem" (from libvirt docs) and makes it difficult to calculate a 
suitable value for /domain/memtune/hard_limit. If the value is too small the VM 
will fail to start.

nova also has a 'reserved_host_memory_mb' setting which should include the qemu 
overhead IMO. But the docs have no guidance on how to set that, likely because 
there is no known way to reliably calculate the overhead.

  - Things that interfere with the live migration operation

     * Firewall blocks libvirtd <-> libvirtd comms

     * Firewall blocks QEMU <-> QEMU comms

     * Storage copy is not requested and disks are not
       on shared storage

I think these could be successfully checked in dry run too.

     * Network connectivity won't seemlessly switch for
       guest NICs

     * Bugs in QEMU when loading device state causing
       failure

     * Bugs in libvirt not correctly configuring QEMU
       to ensure stable ABI

     * Live migration never converging

I've no illusions that these can be checked in dry run :-).

    Some of these get seen quite quickly such as firewall
    issues. Bugs in device state are only seen durnig the
    main data transfer. Problems with storage/network
    setup are only seen when the guest crashes & burns
    after migration is complete & are hard to diagnose
    earlier from libvirt's POV. Apps like nova can
    diagnose this kind of thing better as they have a
    higher level view of the storage/network connectivity
    that libvirt can't see.

    Live migration convergance is the real hard one
    that causes alot of pain for people. Personally I
    recommend that people use post-copy by defalt
    to guarantee convergance in finite time, with
    low impact on guest performance. There was an
    interesting presentation at KVM Forum this year
    about doing workload prediction for VMs to identify
    which time/day has a workload that most friendly
    towards convergance.

Ah, interesting. I've watched some of the videos as they become available in the 
youtube channel and will look for this one.

Even launching QEMU isn't good enough - it has to actually process the
migration data stream for devices to get a good indication of success,
at which point you're basically doing a real migration.

Bummer. I guess that answers my question above: no. It also implies apps
cannot reliably check if a migration will succeed and should instead put
effort into handling errors from an actual migration :-).

Yep, we pretty much have to accept that live migration is going to fail
and work to ensure that when it fails, you don't loose the original
VM.

Thanks for your detailed response, much appreciated.

Regards,
Jim

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list