Thanks John for the comments. On Fri, Sep 18, 2015 at 10:34 PM, John Ferlan <jferlan@xxxxxxxxxx> wrote: > > > On 09/14/2015 10:44 AM, Shivaprasad G Bhat wrote: >> Tunnelled migration can hang if the destination qemu exits despite all the >> ABI checks. This happens whenever the destination qemu exits before the >> complete transfer is noticed by source qemu. The savevm state checks at >> runtime can fail at destination and cause qemu to error out. >> The source qemu cant notice it as the EPIPE is not propogated to it. >> The qemuMigrationIOFunc() notices the stream being broken from virStreamSend() >> and it cleans up the stream alone. The qemuMigrationWaitForCompletion() would >> never get to 100% transfer completion. >> The qemuMigrationWaitForCompletion() never breaks out as well since >> the ssh connection to destination is healthy, and the source qemu also thinks >> the migration is ongoing as the Fd to which it transfers, is never >> closed or broken. So, the migration will hang forever. Even Ctrl-C on the >> virsh migrate wouldn't be honoured. Close the source side FD when there is >> an error in the stream. That way, the source qemu updates itself and >> qemuMigrationWaitForCompletion() notices the failure. >> >> Close the FD for all kinds of errors to be sure. The error message is not >> copied for EPIPE so that the destination error is copied instead later. >> >> Note: >> Reproducible with repeated migrations between Power hosts running in different >> subcores-per-core modes. >> >> Changes from v1 -> v2: >> VIR_FORCE_CLOSE() was called twice for this use case which would log >> unneccessary warnings. So, move the fd close to qemuMigrationIOFunc >> so that there are no unnecessary duplicate attempts.(Would this trigger >> a Coverity error? I don't have a setup to check.) >> >> Signed-off-by: Shivaprasad G Bhat <sbhat@xxxxxxxxxxxxxxxxxx> >> --- >> src/qemu/qemu_migration.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c >> index ff89ab5..9602fb2 100644 >> --- a/src/qemu/qemu_migration.c >> +++ b/src/qemu/qemu_migration.c >> @@ -4012,6 +4012,7 @@ static void qemuMigrationIOFunc(void *arg) >> if (virStreamFinish(data->st) < 0) >> goto error; >> >> + VIR_FORCE_CLOSE(data->sock); >> VIR_FREE(buffer); >> >> return; >> @@ -4029,7 +4030,11 @@ static void qemuMigrationIOFunc(void *arg) >> } >> >> error: >> - virCopyLastError(&data->err); >> + /* Let the source qemu know that the transfer cant continue anymore. >> + * Don't copy the error for EPIPE as destination has the actual error. */ >> + VIR_FORCE_CLOSE(data->sock); >> + if (!virLastErrorIsSystemErrno(EPIPE)) >> + virCopyLastError(&data->err); >> virResetLastError(); >> VIR_FREE(buffer); >> } >> @@ -4397,7 +4402,6 @@ qemuMigrationRun(virQEMUDriverPtr driver, >> if (iothread && qemuMigrationStopTunnel(iothread, ret < 0) < 0) >> ret = -1; >> } >> - VIR_FORCE_CLOSE(fd); > > ^^^ > > This causes Coverity to claim a RESOURCE_LEAK > > Feels like this was a mistake edit... > The VIR_FORCE_CLOSE() inside qemuMigrationIOFunc() alone is sufficient. Having this again here would lead to Warning in the logs. I too thought coverity would complain. Is there a way to force ignore a coverity warning? Thanks and Regards, Shivaprasad > The rest of the patch looks reasonable; however, I'm far from the expert > in these matters. > > John >> >> if (priv->job.completed) { >> qemuDomainJobInfoUpdateTime(priv->job.completed); >> >> -- >> libvir-list mailing list >> libvir-list@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/libvir-list >> -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list