On 10/14/19 2:18 AM, Jiri Denemark wrote: > On Fri, Oct 11, 2019 at 23:18:29 +0000, Jim Fehlig wrote: >> I've been investigating a lockd lock ordering bug in a migration error handling >> path in the libxl driver. In the perform phase, the src calls >> virDomainLockProcessPause to release the lock before sending the VM to dst. In >> this case the send fails for other reasons and an attempt is made to reacquire >> the lock with virDomainLockProcessResume. But that fails since the dst has not >> finished cleaning up the failed VM and releasing the lock it acquired when >> starting to receive the VM. My immediate reaction was "why not reacquire the >> lock in the confirm phase", but then I saw my older comment a few lines later in >> the perform phase code >> >> /* >> * Confirm phase will not be executed if perform fails. End the >> * job started in begin phase. >> */ >> >> Is that just a bug in the implementation, or is it intended to skip the confirm >> phase if perform fails? > > It's intended. The Perform phase runs on the source hosts so why should > we call Confirm to let the source know about the failure? To do any cleanup of the failed migration after the dst has done it's cleanup in the finish phase? > But of course, > the source has to cleanup after the failed migration similarly to what > Confirm would do. I've made slight changes to the lock ordering and it looks promising after initial tests. I'll post a patch after further testing. Thanks! Regards, Jim -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list