On 13.11.2014 03:36, Jim Fehlig wrote:
This series of patches fixes problems discovered in libxl migration. The first patch fixes an issue that went undetected while testing the initial implementation of migration. Receiving migration data occurs in the context of an event loop callback, effectively blocking the event loop during the entire migration process. The patch moves the work of receiving migration data to a thread. Interestingly, this issue manifested in a failed migration due to failed keepalives, which would kill virsh's connection to dst host. The dst host failed to respond to keepalives since its event loop was blocked on receiving migration data. Ultimately the migration perform phase would succeed leaving a running domain on dst. However, the subsequent finish phase would fail since virsh's connection to dst had been killed by the keepalive failure. Since finish failed, the confirm phase would resume the domain on src. Yikes! Same domain running on two different hosts :(. Patches 2 and 3 improve handling of errors in the event the perform or finish phases of migration fail. See the individual patches for details. Jim Fehlig (3): libxl: Receive migration data in a thread libxl: start domain paused on migration dst libxl: destroy domain in migration finish phase on failure src/libxl/libxl_migration.c | 75 ++++++++++++++++++++++++++++++--------------- 1 file changed, 51 insertions(+), 24 deletions(-)
ACK series, but see my comment to 1/3. Michal -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list