Re: [PATCH] qemu_migration: Avoid crashing if domain dies too quickly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/08/2013 10:31 AM, Michal Privoznik wrote:
> I've noticed a SIGSEGV-ing libvirtd on the destination when the qemu
> died too quickly = in Prepare phase. What is happening here is:
> 
> 1) [Thread 3493] We are in qemuMigrationPrepareAny() and calling
> qemuProcessStart() which subsequently calls qemuProcessWaitForMonitor()
> and qemuConnectMonitor(). So far so good. The qemuMonitorOpen()
> succeeds, however switching monitor to QMP mode fails as qemu died
> meanwhile. That is qemuMonitorSetCapabilities() returns -1.
> 
> 2013-10-08 15:54:10.629+0000: 3493: debug : qemuMonitorSetCapabilities:1356 : mon=0x14a53da0
> 2013-10-08 15:54:10.630+0000: 3493: debug : qemuMonitorJSONCommandWithFd:262 : Send command '{"execute":"qmp_capabilities","id":"libvirt-1"}' for write with FD -1
> 2013-10-08 15:54:10.630+0000: 3493: debug : virEventPollUpdateHandle:147 : EVENT_POLL_UPDATE_HANDLE: watch=17 events=13
> ...
> 2013-10-08 15:54:10.631+0000: 3493: debug : qemuMonitorSend:956 : QEMU_MONITOR_SEND_MSG: mon=0x14a53da0 msg={"execute":"qmp_capabilities","id":"libvirt-1"}
>  fd=-1
> 2013-10-08 15:54:10.631+0000: 3262: debug : virEventPollRunOnce:641 : Poll got 1 event(s)
> 
> 2) [Thread 3262] The event loop is trying to do the talking to monitor.
> However, qemu is dead already, remember?
> 
> 2013-10-08 15:54:13.436+0000: 3262: error : qemuMonitorIORead:551 : Unable to read from monitor: Connection reset by peer
> 2013-10-08 15:54:13.516+0000: 3262: debug : virFileClose:90 : Closed fd 25
> ...
> 2013-10-08 15:54:13.533+0000: 3493: debug : qemuMonitorSend:968 : Send command resulted in error internal error: early end of file from monitor: possible problem:
> 
> 3) [Thread 3493] qemuProcessStart() failed. No big deal. Go to the
> 'endjob' label and subsequently to the 'cleanup'. Since the domain is
> not persistent and ret is -1, the qemuDomainRemoveInactive() is called.
> This has an (unpleasant) effect of virObjectUnref()-in the @vm object.
> Unpleasant because the event loop which is about to trigger EOF callback
> still holds a pointer to the @vm (not the reference). See the valgrind
> output below.
> 
> 4) [Thread 3262] So the even loop starts triggering EOF:

s/even/event/

> 
> 2013-10-08 15:54:13.542+0000: 3262: debug : qemuMonitorIO:729 : Triggering EOF callback
> 2013-10-08 15:54:13.543+0000: 3262: debug : qemuProcessHandleMonitorEOF:294 : Received EOF on 0x14549110 'migt10'
> 
> And the monitor is cleaned up. This results in calling
> qemuProcessHandleMonitorEOF with the @vm pointer passed. The pointer is
> kept in qemuMonitor struct.
> 
> ==3262== Thread 1:
> ==3262== Invalid read of size 4

> 
> The mon->vm is set in qemuMonitorOpenInternal() which is the correct
> place to increase @vm ref counter. The correct place to decrease the ref
> counter is then qemuMonitorDispose(). However, in order to avoid cyclic
> calling of dispose functions, we must use hack, and set priv->mon to
> NULL. The qemuDomainObjPrivateFree won't call us again then. This
> describes the last two chunks of the commit. The others are there for
> handling virObjectRef() and virObjectUnref() correctly. So far it
> doesn't work well with NULL virClassPtr :)
> 
> Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx>
> ---
> 
> Yet another one of those patches, where chasing it down took a half of a day
> and description what went wrong is longer than a few lines of fix.

Yeah, we've had a few of those lately.

> 
>  src/qemu/qemu_capabilities.c | 14 ++++++++++----
>  src/qemu/qemu_monitor.c      | 12 ++++++++++++
>  2 files changed, 22 insertions(+), 4 deletions(-)

> @@ -781,6 +792,7 @@ qemuMonitorOpenInternal(virDomainObjPtr vm,
>      }
>      mon->fd = fd;
>      mon->hasSendFD = hasSendFD;
> +    virObjectRef(vm);
>      mon->vm = vm;

Could be written:
mon->vm = virObjectRef(vm);

Overall, looks fine to me, but I'd feel more comfortable if you also got
Dan to review it before pushing.

ACK.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list

[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]