在 2012-11-02五的 15:32 +0100,Jiri Denemark写道: > On Fri, Nov 02, 2012 at 09:30:27 +0800, liguang wrote: > > original migration did not aware of offline case, > > so, try to support offline migration quietly > > (did not disturb original migration) by pass > > VIR_MIGRATE_OFFLINE flag to migration APIs if only > > the domain is really inactive, and > > migration process will not puzzled by domain > > offline and exit unexpectedly. > > these changes did not take care of disk images the > > domain required, for them could be transferred by > > other APIs as suggested, then VIR_MIGRATE_OFFLINE > > should not combined with VIR_MIGRATE_NON_SHARED_*. > > if you want a persistent migration, > > you should do "virsh migrate --persistent" youself. > > > > v12: > > rebased for conflicting with commit 2f3e2c0c434218a3d656c08779cb98b327170e11, > > and take in some messages from Doug Goldstein's patch > > https://www.redhat.com/archives/libvir-list/2012-October/msg00957.html > > > > Signed-off-by: liguang <lig.fnst@xxxxxxxxxxxxxx> > > --- > > include/libvirt/libvirt.h.in | 1 + > > src/qemu/qemu_driver.c | 15 +++++++++ > > src/qemu/qemu_migration.c | 66 ++++++++++++++++++++++++++++++++++++----- > > src/qemu/qemu_migration.h | 3 +- > > tools/virsh-domain.c | 10 ++++++ > > 5 files changed, 86 insertions(+), 9 deletions(-) > > > > diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in > > index 2b17cef..a8f641f 100644 > > --- a/include/libvirt/libvirt.h.in > > +++ b/include/libvirt/libvirt.h.in > > @@ -1089,6 +1089,7 @@ typedef enum { > > * whole migration process; this will be used automatically > > * when supported */ > > VIR_MIGRATE_UNSAFE = (1 << 9), /* force migration even if it is considered unsafe */ > > + VIR_MIGRATE_OFFLINE = (1 << 10), /* offline migrate */ > > } virDomainMigrateFlags; > > > > /* Domain migration. */ > > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c > > index 267bbf1..f5c05e4 100644 > > --- a/src/qemu/qemu_driver.c > > +++ b/src/qemu/qemu_driver.c > > @@ -9759,6 +9759,20 @@ qemuDomainMigrateBegin3(virDomainPtr domain, > > } > > > > if (!virDomainObjIsActive(vm)) { > > + if (flags & VIR_MIGRATE_OFFLINE) { > > + if (flags & (VIR_MIGRATE_NON_SHARED_DISK| > > + VIR_MIGRATE_NON_SHARED_INC)) { > > + virReportError(VIR_ERR_OPERATION_INVALID, > > + "%s", _("offline migration cannot handle non-shared storage")); > > + goto endjob; > > + } > > + if (!(flags & VIR_MIGRATE_PERSIST_DEST)) { > > + virReportError(VIR_ERR_OPERATION_INVALID, > > + "%s", _("offline migration must be specified with the persistent flag set")); > > + goto endjob; > > + } > > + goto offline; > > + } > > virReportError(VIR_ERR_OPERATION_INVALID, > > "%s", _("domain is not running")); > > goto endjob; > > @@ -9771,6 +9785,7 @@ qemuDomainMigrateBegin3(virDomainPtr domain, > > if (qemuDomainCheckEjectableMedia(driver, vm, asyncJob) < 0) > > goto endjob; > > > > +offline: > > if (!(xml = qemuMigrationBegin(driver, vm, xmlin, dname, > > cookieout, cookieoutlen, > > flags))) > > I think this label is pretty confusing since the code after it is not just for > offline migration. Not to mention that this label is used to just skip calling > qemuDomainCheckEjectableMedia(). I think it would look better as > > bool offline = !!(flags & VIR_MIGRATE_OFFLINE); > ... > if (!virDomainObjIsActive(vm)) { > if (offline) { > if (flags & (VIR_MIGRATE_NON_SHARED_DISK| > VIR_MIGRATE_NON_SHARED_INC)) { > virReportError(VIR_ERR_OPERATION_INVALID, > "%s", _("offline migration cannot handle non-shared storage")); > goto endjob; > } > if (!(flags & VIR_MIGRATE_PERSIST_DEST)) { > virReportError(VIR_ERR_OPERATION_INVALID, > "%s", _("offline migration must be specified with the persistent flag set")); > goto endjob; > } > } else { > virReportError(VIR_ERR_OPERATION_INVALID, > "%s", _("domain is not running")); > goto endjob; > } > } > > /* Check if there is any ejected media. > * We don't want to require them on the destination. > */ > > if (!offline && > qemuDomainCheckEjectableMedia(driver, vm, asyncJob) < 0) > goto endjob; > > if (!(xml = qemuMigrationBegin(driver, vm, xmlin, dname, > cookieout, cookieoutlen, > flags))) > goto endjob; > > > Oh. And this whole code is actually at wrong place. It should be placed in > qemuMigrationBegin() so that it's also called in case of peer-to-peer > migration. And that also applies for qemuDomainCheckEjectableMedia(), but that > should be moved first as a separate patch since it is a preexisting bug not > related to offline migration. > right, I can move this code snippet to qemuMigrationBegin(), and do some work to prettify. > > diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c > > index e0acc20..85fc48b 100644 > > --- a/src/qemu/qemu_migration.c > > +++ b/src/qemu/qemu_migration.c > > @@ -72,6 +72,7 @@ enum qemuMigrationCookieFlags { > > QEMU_MIGRATION_COOKIE_FLAG_LOCKSTATE, > > QEMU_MIGRATION_COOKIE_FLAG_PERSISTENT, > > QEMU_MIGRATION_COOKIE_FLAG_NETWORK, > > + QEMU_MIGRATION_COOKIE_FLAG_OFFLINE, > > > > QEMU_MIGRATION_COOKIE_FLAG_LAST > > }; > > @@ -79,13 +80,14 @@ enum qemuMigrationCookieFlags { > > VIR_ENUM_DECL(qemuMigrationCookieFlag); > > VIR_ENUM_IMPL(qemuMigrationCookieFlag, > > QEMU_MIGRATION_COOKIE_FLAG_LAST, > > - "graphics", "lockstate", "persistent", "network"); > > + "graphics", "lockstate", "persistent", "network", "offline"); > > > > enum qemuMigrationCookieFeatures { > > QEMU_MIGRATION_COOKIE_GRAPHICS = (1 << QEMU_MIGRATION_COOKIE_FLAG_GRAPHICS), > > QEMU_MIGRATION_COOKIE_LOCKSTATE = (1 << QEMU_MIGRATION_COOKIE_FLAG_LOCKSTATE), > > QEMU_MIGRATION_COOKIE_PERSISTENT = (1 << QEMU_MIGRATION_COOKIE_FLAG_PERSISTENT), > > QEMU_MIGRATION_COOKIE_NETWORK = (1 << QEMU_MIGRATION_COOKIE_FLAG_NETWORK), > > + QEMU_MIGRATION_COOKIE_OFFLINE = (1 << QEMU_MIGRATION_COOKIE_FLAG_OFFLINE), > > }; > > > > typedef struct _qemuMigrationCookieGraphics qemuMigrationCookieGraphics; > > @@ -594,6 +596,9 @@ qemuMigrationCookieXMLFormat(struct qemud_driver *driver, > > if ((mig->flags & QEMU_MIGRATION_COOKIE_NETWORK) && mig->network) > > qemuMigrationCookieNetworkXMLFormat(buf, mig->network); > > > > + if (mig->flags & QEMU_MIGRATION_COOKIE_OFFLINE) > > + virBufferAsprintf(buf, " <offline/>\n"); > > + > > virBufferAddLit(buf, "</qemu-migration>\n"); > > return 0; > > } > > @@ -874,6 +879,11 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr mig, > > (!(mig->network = qemuMigrationCookieNetworkXMLParse(ctxt)))) > > goto error; > > > > + if ((flags & QEMU_MIGRATION_COOKIE_OFFLINE)) { > > + if (virXPathBoolean("count(./offline) > 0", ctxt)) > > + mig->flags |= QEMU_MIGRATION_COOKIE_OFFLINE; > > + } > > + > > return 0; > > > > error: > > @@ -938,6 +948,12 @@ qemuMigrationBakeCookie(qemuMigrationCookiePtr mig, > > return -1; > > } > > > > + if (flags & QEMU_MIGRATION_COOKIE_OFFLINE) { > > + mig->flags |= QEMU_MIGRATION_COOKIE_OFFLINE; > > + } > > + > > + > > + > > To much space here, one empty line is enough :-) > yes, will squash. > > if (!(*cookieout = qemuMigrationCookieXMLFormatStr(driver, mig))) > > return -1; > > > > @@ -1443,6 +1459,13 @@ char *qemuMigrationBegin(struct qemud_driver *driver, > > QEMU_MIGRATION_COOKIE_LOCKSTATE) < 0) > > goto cleanup; > > > > + if (flags & VIR_MIGRATE_OFFLINE) { > > + if (qemuMigrationBakeCookie(mig, driver, vm, > > + cookieout, cookieoutlen, > > + QEMU_MIGRATION_COOKIE_OFFLINE) < 0) > > + goto cleanup; > > + } > > + > > if (xmlin) { > > if (!(def = virDomainDefParseString(driver->caps, xmlin, > > QEMU_EXPECTED_VIRT_TYPES, > > @@ -1607,6 +1630,15 @@ qemuMigrationPrepareAny(struct qemud_driver *driver, > > goto endjob; > > } > > > > + if (!(mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen, > > + QEMU_MIGRATION_COOKIE_OFFLINE))) > > + return ret; > > + > > Oops, the return statement will leave a big mess uncleaned. You need to goto > endjob instead. > > > + if (mig->flags & QEMU_MIGRATION_COOKIE_OFFLINE) { > > + ret = 0; > > + goto cleanup; > > + } > > + > > This goto cleanup will result in the warning reported by Daniel. Prepare phase > needs to call qemuMigrationJobContinue() on successful return. That is, you > need to do "goto done;" and add new "done:" label just above the following > code (around line 1690 on current HEAD): > good! it's the reason waring spited out, thanks! > /* add "done:" here */ > /* We keep the job active across API calls until the finish() call. > * This prevents any other APIs being invoked while incoming > * migration is taking place. > */ > if (!qemuMigrationJobContinue(vm)) { > vm = NULL; > virReportError(VIR_ERR_OPERATION_FAILED, > "%s", _("domain disappeared")); > goto cleanup; > } > > ret = 0; > > > > /* Start the QEMU daemon, with the same command-line arguments plus > > * -incoming $migrateFrom > > */ > > @@ -2150,6 +2182,9 @@ qemuMigrationRun(struct qemud_driver *driver, > > return -1; > > } > > > > + if (flags & VIR_MIGRATE_OFFLINE) > > + return 0; > > + > > if (!(mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen, > > QEMU_MIGRATION_COOKIE_GRAPHICS))) > > goto cleanup; > > This check seems to be too deep. The whole point of it is to skip transferring No, otherwise we would do much more work to clean up offline path. > migration data (since there's no qemu running anyway) so why doing all the > preparation work for it? I think Perform phase should be completely skipped > for offline migrations. Or do we actually need to do something during Perform > phase? It seems we don't since it is completely skipped during peer-to-peer > migration (not that it would actually work with this patch). > > > @@ -2665,7 +2700,12 @@ static int doPeer2PeerMigrate3(struct qemud_driver *driver, > > uri, &uri_out, flags, dname, resource, dom_xml); > > qemuDomainObjExitRemoteWithDriver(driver, vm); > > } > > + > > VIR_FREE(dom_xml); > > + > > + if (flags & VIR_MIGRATE_OFFLINE) > > + goto cleanup; > > + > > if (ret == -1) > > goto cleanup; > > > > This will skip not only Perform but also Finish phase in peer-to-peer > migration. You want to jump to finish label and do that *after* the check for > ret == 1. > > > @@ -2771,7 +2811,7 @@ finish: > > vm->def->name); > > > > cleanup: > > - if (ddomain) { > > + if (ddomain || (flags & VIR_MIGRATE_OFFLINE)) { > > virObjectUnref(ddomain); > > ret = 0; > > } else { > > This is weird. It means that we don't care if Finish succeeded or not during > offline migration even though Finish is where the domain configuration gets > stored on disk. my purpose is to call virObjectUnref(ddomain), "ret = 0" is no only for offline migration. > > > @@ -2848,7 +2888,7 @@ static int doPeer2PeerMigrate(struct qemud_driver *driver, > > } > > > > /* domain may have been stopped while we were talking to remote daemon */ > > - if (!virDomainObjIsActive(vm)) { > > + if (!virDomainObjIsActive(vm) && !(flags & VIR_MIGRATE_OFFLINE)) { > > virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > > _("guest unexpectedly quit")); > > goto cleanup; > > @@ -2911,7 +2951,7 @@ qemuMigrationPerformJob(struct qemud_driver *driver, > > if (qemuMigrationJobStart(driver, vm, QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) > > goto cleanup; > > > > - if (!virDomainObjIsActive(vm)) { > > + if (!virDomainObjIsActive(vm) && !(flags & VIR_MIGRATE_OFFLINE)) { > > virReportError(VIR_ERR_OPERATION_INVALID, > > "%s", _("domain is not running")); > > goto endjob; > > @@ -3236,6 +3276,8 @@ qemuMigrationFinish(struct qemud_driver *driver, > > */ > > if (retcode == 0) { > > if (!virDomainObjIsActive(vm)) { > > + if (flags & VIR_MIGRATE_OFFLINE) > > + goto offline; > > virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > > _("guest unexpectedly quit")); > > goto endjob; > > @@ -3255,6 +3297,7 @@ qemuMigrationFinish(struct qemud_driver *driver, > > if (qemuDomainMigrateOPDRelocate(driver, vm, mig) < 0) > > VIR_WARN("unable to provide network data for relocation"); > > > > + offline: > > if (flags & VIR_MIGRATE_PERSIST_DEST) { > > virDomainDefPtr vmdef; > > if (vm->persistent) > > Again, pretty confusing usage of offline label. > > > @@ -3302,7 +3345,7 @@ qemuMigrationFinish(struct qemud_driver *driver, > > event = NULL; > > } > > > > - if (!(flags & VIR_MIGRATE_PAUSED)) { > > + if (!(flags & VIR_MIGRATE_PAUSED) && !(flags & VIR_MIGRATE_OFFLINE)) { > > /* run 'cont' on the destination, which allows migration on qemu > > * >= 0.10.6 to work properly. This isn't strictly necessary on > > * older qemu's, but it also doesn't hurt anything there > > @@ -3351,9 +3394,11 @@ qemuMigrationFinish(struct qemud_driver *driver, > > VIR_DOMAIN_EVENT_SUSPENDED, > > VIR_DOMAIN_EVENT_SUSPENDED_PAUSED); > > } > > - if (virDomainSaveStatus(driver->caps, driver->stateDir, vm) < 0) { > > - VIR_WARN("Failed to save status on vm %s", vm->def->name); > > - goto endjob; > > + if (virDomainObjIsActive(vm)) { > > + if (virDomainSaveStatus(driver->caps, driver->stateDir, vm) < 0) { > > + VIR_WARN("Failed to save status on vm %s", vm->def->name); > > + goto endjob; > > + } > > } > > > > /* Guest is successfully running, so cancel previous auto destroy */ > > @@ -3373,6 +3418,7 @@ qemuMigrationFinish(struct qemud_driver *driver, > > endjob: > > if (qemuMigrationJobFinish(driver, vm) == 0) { > > vm = NULL; > > + } else if (flags & VIR_MIGRATE_OFFLINE) { > > } else if (!vm->persistent && !virDomainObjIsActive(vm)) { > > qemuDomainRemoveInactive(driver, vm); > > vm = NULL; > > This is wrong since vm->persistent will always be true for offline > migration. Unless something failed early in which case we actually want to > remove the domain. The current code should work without any modification. right, unnecessary now, will remove. > > > @@ -3420,6 +3466,9 @@ int qemuMigrationConfirm(struct qemud_driver *driver, > > if (!(mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen, 0))) > > return -1; > > > > + if (flags & VIR_MIGRATE_OFFLINE) > > + goto offline; > > + > > /* Did the migration go as planned? If yes, kill off the > > * domain object, but if no, resume CPUs > > */ > > @@ -3455,6 +3504,7 @@ int qemuMigrationConfirm(struct qemud_driver *driver, > > } > > } > > > > +offline: > > qemuMigrationCookieFree(mig); > > rv = 0; > > > > Confusing label again, I'd use "done" or something similar. > > We're almost there, just some unusual code paths (other than successful > non-p2p migration) need fixing. Moreover, I think we should explicitly fail v2 > migration with OFFLINE flag to avoid having to touch that path as well. And really? I think we will not fall into V2 migration path anymore, for we always check V3 first, and will return true(qemu driver). > since you need to do changes to doPeer2PeerMigrate3 you should do similar > changes to migration APIs in libvirt.c (since they are very similar) which > should let you avoid changing some other parts. > > Jirka Thanks! -- liguang lig.fnst@xxxxxxxxxxxxxx FNST linux kernel team -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list