On 11.02.2015 17:42, Jiri Denemark wrote: > On Wed, Feb 11, 2015 at 13:51:11 +0100, Michal Privoznik wrote: >> When migrating with storage, libvirt iterates over domain disks and >> instruct qemu to migrate the ones we are interested in (shared, RO and >> source-less disks are skipped). The disks are migrated in series. No >> new disk is transferred until the previous one hasn't been quiesced. >> This is checked on the qemu monitor via 'query-jobs' command. If the >> disk has been quiesced, it practically went from copying its content >> to mirroring state, where all disk writes are mirrored to the other >> side of migration too. Having said that, there's one inherent error in >> the design. The monitor command we use reports only active jobs. So if >> the job fails for whatever reason, we will not see it anymore in the >> command output. And this can happen fairly simply: just try to migrate >> a domain with storage. If the storage migration fails (e.g. due to >> ENOSPC on the destination) we resume the host on the destination and >> let it run on partly copied disk. >> >> The proper fix is what even the comment in the code says: listen for >> qemu events instead of polling. If storage migration changes state an >> event is emitted and we can act accordingly: either consider disk >> copied and continue the process, or consider disk mangled and abort >> the migration. >> >> Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> >> --- >> src/qemu/qemu_migration.c | 37 +++++++++++++++++-------------------- >> 1 file changed, 17 insertions(+), 20 deletions(-) >> >> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c >> index 4506d87..14a4ec6 100644 >> --- a/src/qemu/qemu_migration.c >> +++ b/src/qemu/qemu_migration.c >> @@ -1739,7 +1739,6 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver, > ... >> - /* XXX Frankly speaking, we should listen to the events, >> - * instead of doing this. But this works for now and we >> - * are doing something similar in migration itself anyway */ >> + /* XXX Turn this into virCond someday. */ > > Exactly, this is what I wanted to suggest when seeing this code. The > code should be even simpler with virCond so why not doing the change > right away? I think this could be done as a separate patch (in this > series) to make reviewing the code easier. Exactly. It has not direct influence on this patcheset. The virCond patch should be easy and small though, so after this is merged, I can propose one. But I'd really love to fix the bug asap. Michal -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list