Re: [Questions] non-shared disk migration: jobs abort and bandwidth

Peter Krempa <pkrempa@xxxxxxxxxx> · Wed, 8 Jun 2022 12:49:52 +0200

On Wed, Jun 08, 2022 at 17:32:57 +0800, Han Han wrote:
> Hi developers,
> Recently, I am researching migration with non-share disk(flags
> VIR_MIGRATE_NON_SHARED_DISK and VIR_MIGRATE_NON_SHARED_INC).
> As we know, the non-shared disk migration could have block jobs to copy the
> disk image from the src host to the dst host. So here are my questions for
> non-shared disk migration:
> q1. For the API virDomainMigrate3 with the bandwidth param, could it set
> the bandwidth of block jobs?
> q2. For the API virDomainMigrateSetMaxSpeed, could it set the bandwidth of
> block jobs?
> q3. For the domain job abort API virDomainAbortJob, could it stop the block
> job of non-shared disk migration?
> q4. For the block job bandwidth API virDomainBlockJobSetSpeed, could it set
> the block job of non-shared disk migration?
> q5. For the block job abort API virDomainBlockJobAbort, could it stop the
> block job of non-shared disk migration?
> 
> 
> 
> Then I got the test results of libvirt-8.4.0-1.el9.x86_64
> qemu-kvm-7.0.0-4.el9.x86_64:
> q1: The bandwidth limit of virDomainMigrate3 is effective to the blockjob:
> ➜  ~ virsh migrate OVMF qemu+ssh://root@hhan-rhel9--1/system --live --p2p
> --tls --tls-destination hhan-rhel9--1 --copy-storage-all --disks-uri
> tcp://hhan-rhel9--1:49156 --bandwidth 2
> ➜  ~ virsh blockjob OVMF vda
> Block Copy: [  0 %]    Bandwidth limit: 2097152 bytes/s (2.000 MiB/s)

This is expected and desired.

> q2: The virDomainMigrateSetMaxSpeed doesn't change the the bandwidth of
> block jobs.
> ➜  ~ virsh migrate-setspeed OVMF 8
> 
> ➜  ~ virsh blockjob OVMF vda
> Block Copy: [  9 %]    Bandwidth limit: 2097152 bytes/s (2.000 MiB/s)

This is a bug though, setting the migration speed should, based on the
fact that  we want to use the global migration speed flag for disks too
, apply also to the disk migration streams.

> q3: The virDomainAbortJob could stop a block job of non-shared disk
> migration
> ➜  ~ virsh migrate OVMF qemu+ssh://root@hhan-rhel9--1/system --live --p2p
> --tls --tls-destination hhan-rhel9--1 --copy-storage-all --disks-uri
> tcp://hhan-rhel9--1:49156 --bandwidth 2
> Then start a virsh event on another terminal:
> ➜  ~ virsh event --loop --all
> 
> Abort the domain job:
> ➜  ~ virsh domjobabort OVMF
> 
> The error "error: operation aborted: migration out: canceled by client"
> appears at the terminal of "virsh migrate"
> The terminal of "virsh event" shows the block job has been failed:
> event 'block-job' for domain 'OVMF': Block Copy for
> /var/lib/libvirt/images/OVMF.qcow2 failed
> event 'block-job-2' for domain 'OVMF': Block Copy for vda failed

This is again expected, the blockjobs are started by the migration thus
when you cancel the migration we also need to cancel the blockjobs.

> q4: The block job bandwidth of non-shared disk migration cannot be set by
> virDomainBlockJobSetSpeed:
> ➜  ~ virsh blockjob OVMF vda --bandwidth 10
> error: Timed out during operation: cannot acquire state change lock (held
> by monitor=remoteDispatchDomainMigratePerform3Params)

This is okay, but we could take it a sa feature request to allow tuning
of the individual blockjobs.

> q5: The block job of non-shared disk migration cannot be aborted by
> virDomainBlockJobAbort:
> ➜  ~ virsh blockjob OVMF vda --abort
> error: Timed out during operation: cannot acquire state change lock (held
> by monitor=remoteDispatchDomainMigratePerform3Params)

This is expected. Same as above, we dodn't want to allow users to
control this. In contrast to 'q4' I'd refuse a RFE to allow cancelling
of individual jobs.

> Are the results above expected?
> Here are my personal thoughts:
> For the bandwidth in q1 and q2, they are commented as migration bandwidth(
> https://gitlab.com/libvirt/libvirt/-/blob/master/include/libvirt/libvirt-domain.h#L1165
> ,
> https://gitlab.com/libvirt/libvirt/-/blob/master/src/libvirt-domain.c#L9696
> ), but one works for block jobs while one doesn't. So we should make the
> comment clear whether they are the bandwidth of VM migration or the
> bandwidth of migration with blockjobs. What's more, add a flag to
> virDomainMigrateMaxSpeedFlags to support set bandwidth to the blockjobs in
> migration.
> For q4 and q5, if we will not support to change the block job of non-shared
> disk migration by blockjob APIs, we should note that in the migration doc
> or the block job doc, to present the difference between this type of block
> job and the others.