On 09/14/2015 10:44 AM, Shivaprasad G Bhat wrote: > Tunnelled migration can hang if the destination qemu exits despite all the > ABI checks. This happens whenever the destination qemu exits before the > complete transfer is noticed by source qemu. The savevm state checks at > runtime can fail at destination and cause qemu to error out. > The source qemu cant notice it as the EPIPE is not propogated to it. > The qemuMigrationIOFunc() notices the stream being broken from virStreamSend() > and it cleans up the stream alone. The qemuMigrationWaitForCompletion() would > never get to 100% transfer completion. > The qemuMigrationWaitForCompletion() never breaks out as well since > the ssh connection to destination is healthy, and the source qemu also thinks > the migration is ongoing as the Fd to which it transfers, is never > closed or broken. So, the migration will hang forever. Even Ctrl-C on the > virsh migrate wouldn't be honoured. Close the source side FD when there is > an error in the stream. That way, the source qemu updates itself and > qemuMigrationWaitForCompletion() notices the failure. > > Close the FD for all kinds of errors to be sure. The error message is not > copied for EPIPE so that the destination error is copied instead later. > > Note: > Reproducible with repeated migrations between Power hosts running in different > subcores-per-core modes. > > Changes from v1 -> v2: > VIR_FORCE_CLOSE() was called twice for this use case which would log > unneccessary warnings. So, move the fd close to qemuMigrationIOFunc > so that there are no unnecessary duplicate attempts.(Would this trigger > a Coverity error? I don't have a setup to check.) > > Signed-off-by: Shivaprasad G Bhat <sbhat@xxxxxxxxxxxxxxxxxx> > --- > src/qemu/qemu_migration.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c > index ff89ab5..9602fb2 100644 > --- a/src/qemu/qemu_migration.c > +++ b/src/qemu/qemu_migration.c > @@ -4012,6 +4012,7 @@ static void qemuMigrationIOFunc(void *arg) > if (virStreamFinish(data->st) < 0) > goto error; > > + VIR_FORCE_CLOSE(data->sock); > VIR_FREE(buffer); > > return; > @@ -4029,7 +4030,11 @@ static void qemuMigrationIOFunc(void *arg) > } > > error: > - virCopyLastError(&data->err); > + /* Let the source qemu know that the transfer cant continue anymore. > + * Don't copy the error for EPIPE as destination has the actual error. */ > + VIR_FORCE_CLOSE(data->sock); > + if (!virLastErrorIsSystemErrno(EPIPE)) > + virCopyLastError(&data->err); > virResetLastError(); > VIR_FREE(buffer); > } > @@ -4397,7 +4402,6 @@ qemuMigrationRun(virQEMUDriverPtr driver, > if (iothread && qemuMigrationStopTunnel(iothread, ret < 0) < 0) > ret = -1; > } > - VIR_FORCE_CLOSE(fd); ^^^ This causes Coverity to claim a RESOURCE_LEAK Feels like this was a mistake edit... The rest of the patch looks reasonable; however, I'm far from the expert in these matters. John > > if (priv->job.completed) { > qemuDomainJobInfoUpdateTime(priv->job.completed); > > -- > libvir-list mailing list > libvir-list@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/libvir-list > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list