Re: Question about migration confirm phase

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/14/19 2:18 AM, Jiri Denemark wrote:
> On Fri, Oct 11, 2019 at 23:18:29 +0000, Jim Fehlig wrote:
>> I've been investigating a lockd lock ordering bug in a migration error handling
>> path in the libxl driver. In the perform phase, the src calls
>> virDomainLockProcessPause to release the lock before sending the VM to dst. In
>> this case the send fails for other reasons and an attempt is made to reacquire
>> the lock with virDomainLockProcessResume. But that fails since the dst has not
>> finished cleaning up the failed VM and releasing the lock it acquired when
>> starting to receive the VM. My immediate reaction was "why not reacquire the
>> lock in the confirm phase", but then I saw my older comment a few lines later in
>> the perform phase code
>>
>>           /*
>>            * Confirm phase will not be executed if perform fails. End the
>>            * job started in begin phase.
>>            */
>>
>> Is that just a bug in the implementation, or is it intended to skip the confirm
>> phase if perform fails?
> 
> It's intended. The Perform phase runs on the source hosts so why should
> we call Confirm to let the source know about the failure?

To do any cleanup of the failed migration after the dst has done it's cleanup in 
the finish phase?

> But of course,
> the source has to cleanup after the failed migration similarly to what
> Confirm would do.

I've made slight changes to the lock ordering and it looks promising after 
initial tests. I'll post a patch after further testing. Thanks!

Regards,
Jim

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list



[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux