On Mon, Jan 29, 2018 at 15:56:29 +0530, Prerna wrote: > Hi Jirka, > > On Thu, Jan 25, 2018 at 8:43 PM, Jiri Denemark <jdenemar@xxxxxxxxxx> wrote: > > > On Thu, Jan 25, 2018 at 19:51:23 +0530, Prerna Saxena wrote: > > > In case of non-p2p migration, in case libvirt client gets disconnected > > from source libvirt > > > after PERFORM phase is over, the daemon just resets the current > > migration job. > > > However, the VM could be left paused on both source and destination in > > such case. In case > > > the client reconnects and queries migration status, the job has been > > blanked out from source libvirt, > > > and this reconnected client has no clear way of figuring out if an > > unclean migration had previously > > > been aborted. > > > > The virDomainGetState API should return VIR_DOMAIN_PAUSED with > > VIR_DOMAIN_PAUSED_MIGRATION reason. Is this not enough? > > > I understand that a client application should poll source libvirtd for > status of migration job completion using virDomainGetJobStats(). Not really, it may poll if it wants to monitor migration progress, but normally the client would just wait for the migration API to return either success or failure. > However, as you explained above, cleanup callbacks clear the job info so a > client should additionally be polling for virDomainGetState() too. Well, even if virDomainGetJobStats with VIR_DOMAIN_JOB_STATS_COMPLETED flag was modified to report the job as VIR_DOMAIN_JOB_FAILED, the client would still need to call virDomainGetState (on both sides in some cases) to check whether the domain is running or it was left in a paused state. So the reporting of a failed job by virDomainGetJobStats does not seem to be really necessary. And it would be a bit confusing too since the flag is called *_COMPLETED, while the migration in fact did not complete. This confusion could be fixed by introducing a new flag, but... > Would it not be cleaner to have a single API reflect the source of truth? Perhaps, but since there already is a way of getting the info, any client which wants to work with more than just a bleeding edge libvirt would still need to implement the existing way. And why would the client bother using the new API when it can be sure the old way will still be available? Doing so would make the client even more complicated for no benefit. But as I said, just seeing that a previous migration job failed is not enough to recover from a disconnected client which was controlling a non-p2p migration. BTW, p2p migration is far less fragile in this respect. If the connection to a client breaks, migration normally continues without any disruption. And if the connection between libvirt daemons fails, both sides will detect it and abort the migration. Of course, a split brain can still happen even with p2p migration, but it's not so easy to trigger it since the time frame in which the connection has to break to cause a split brain is much shorter. Jirka -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list