On Tue, Oct 13, 2015 at 4:51 PM, Shivaprasad bhat <shivaprasadbhat@xxxxxxxxx> wrote: > On Mon, Oct 12, 2015 at 8:03 PM, Jiri Denemark <jdenemar@xxxxxxxxxx> wrote: >> On Thu, Oct 08, 2015 at 17:59:07 +0530, Shivaprasad G Bhat wrote: >>> Tunnelled migration can hang if the destination qemu exits despite all the >>> ABI checks. This happens whenever the destination qemu exits before the >>> complete transfer is noticed by source qemu. The savevm state checks at >>> runtime can fail at destination and cause qemu to error out. >>> The source qemu cant notice it as the EPIPE is not propogated to it. >>> The qemuMigrationIOFunc() notices the stream being broken from virStreamSend() >>> and it cleans up the stream alone. The qemuMigrationWaitForCompletion() would >>> never get to 100% transfer completion. >>> The qemuMigrationWaitForCompletion() never breaks out as well since >>> the ssh connection to destination is healthy, and the source qemu also thinks >>> the migration is ongoing as the Fd to which it transfers, is never >>> closed or broken. So, the migration will hang forever. Even Ctrl-C on the >>> virsh migrate wouldn't be honoured. Close the source side FD when there is >>> an error in the stream. That way, the source qemu updates itself and >>> qemuMigrationWaitForCompletion() notices the failure. >>> >>> Close the FD for all kinds of errors to be sure. The error message is not >>> copied for EPIPE so that the destination error is copied instead later. >>> >>> Note: >>> Reproducible with repeated migrations between Power hosts running in different >>> subcores-per-core modes. >>> >>> Changes from v2 -> v3: >>> VIR_FORCE_CLOSE() needn't be called on the qemu source fd as that is closed >>> by qemuMigrationIOFunc() for tunnelled migration. Set the fd to -1. >>> >>> Signed-off-by: Shivaprasad G Bhat <sbhat@xxxxxxxxxxxxxxxxxx> >>> --- >>> src/qemu/qemu_migration.c | 18 ++++++++++++++---- >>> 1 file changed, 14 insertions(+), 4 deletions(-) >>> >>> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c >>> index 7440108..c982838 100644 >>> --- a/src/qemu/qemu_migration.c >>> +++ b/src/qemu/qemu_migration.c >>> @@ -4019,6 +4019,7 @@ static void qemuMigrationIOFunc(void *arg) >>> if (virStreamFinish(data->st) < 0) >>> goto error; >>> >>> + VIR_FORCE_CLOSE(data->sock); >>> VIR_FREE(buffer); >>> >>> return; >>> @@ -4036,7 +4037,11 @@ static void qemuMigrationIOFunc(void *arg) >>> } >>> >>> error: >>> - virCopyLastError(&data->err); >>> + /* Let the source qemu know that the transfer cant continue anymore. >>> + * Don't copy the error for EPIPE as destination has the actual error. */ >>> + VIR_FORCE_CLOSE(data->sock); >>> + if (!virLastErrorIsSystemErrno(EPIPE)) >>> + virCopyLastError(&data->err); >>> virResetLastError(); >>> VIR_FREE(buffer); >>> } >> >> Hmm, I don't think we need to filter out any error here. Were you >> actually able to get the EPIPE error except for seeing it in the logs? >> What I'm trying to say is: if we get an EPIPE in qemuMigrationIOFunc, we >> close data->sock which will propagate to the source QEMU and >> qemuMigrationWaitForCompletion should return a "migration unexpectedly >> failed" error, which will get stored in orig_err and even though >> qemuMigrationStopTunnel will set the thread local error to the one from >> iothread->data, we will reset it back to the orig_err one. Only in case >> qemuMigrationWaitForCompletion returned success but we still got error >> on the tunnel, we'd actually see the error from iothread->err. >> > > Yes. The qemuMigrationWaitForCompletion() returns success sometimes and > StopTunnel propogating the failure in such a case. This happens inconsistently, > but I do see it happening. Please let me know what I should be doing > if this is not the > way. > Hi Jirka, As you mentioned over IRC, I checked what would be the result of migration if the destination has no error message. I modified the qemu to not to emit any error messages. The error in such case is "internal error: End of file from monitor" which of course is coming from libvirtd is as good as "cannot write to stream: Broken pipe". Both anyway convey very less as to what to do. So, I think ignoring EPIPE is the way to go. Please let me know your opinion. Thanks, Shiva > Thanks, > Shivaprasad > >> Otherwise the patch looks fine. >> >> Jirka -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list