On 10/29/2013 11:22 AM, Cole Robinson wrote: > On 10/29/2013 10:25 AM, Daniel P. Berrange wrote: >> On Mon, Oct 28, 2013 at 01:22:39PM -0400, Cole Robinson wrote: >>> On 10/28/2013 01:14 PM, Daniel P. Berrange wrote: >>>> On Mon, Oct 28, 2013 at 01:08:45PM -0400, Cole Robinson wrote: >>>>> On 10/28/2013 01:06 PM, Daniel P. Berrange wrote: >>>>>> On Mon, Oct 28, 2013 at 01:03:49PM -0400, Cole Robinson wrote: >>>>>>> On 10/28/2013 07:52 AM, Daniel P. Berrange wrote: >>>>>>>> From: "Daniel P. Berrange" <berrange@xxxxxxxxxx> >>>>>>>> >>>>>>>> The following sequence >>>>>>>> >>>>>>>> 1. Define a persistent QMEU guest >>>>>>>> 2. Start the QEMU guest >>>>>>>> 3. Stop libvirtd >>>>>>>> 4. Kill the QEMU process >>>>>>>> 5. Start libvirtd >>>>>>>> 6. List persistent guets >>>>>>>> >>>>>>>> At the last step, the previously running persistent guest >>>>>>>> will be missing. This is because of a race condition in the >>>>>>>> QEMU driver startup code. It does >>>>>>>> >>>>>>>> 1. Load all VM state files >>>>>>>> 2. Spawn thread to reconnect to each VM >>>>>>>> 3. Load all VM config files >>>>>>>> >>>>>>>> Only at the end of step 3, does the 'virDomainObjPtr' get >>>>>>>> marked as "persistent". There is therefore a window where >>>>>>>> the thread reconnecting to the VM will remove the persistent >>>>>>>> VM from the list. >>>>>>>> >>>>>>>> The easy fix is to simply switch the order of steps 2 & 3. >>>>>>>> >>>>>>>> Signed-off-by: Daniel P. Berrange <berrange@xxxxxxxxxx> >>>>>>>> --- >>>>>>>> src/qemu/qemu_driver.c | 3 +-- >>>>>>>> 1 file changed, 1 insertion(+), 2 deletions(-) >>>>>>>> >>>>>>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c >>>>>>>> index c613967..9c3daad 100644 >>>>>>>> --- a/src/qemu/qemu_driver.c >>>>>>>> +++ b/src/qemu/qemu_driver.c >>>>>>>> @@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged, >>>>>>>> >>>>>>>> conn = virConnectOpen(cfg->uri); >>>>>>>> >>>>>>>> - qemuProcessReconnectAll(conn, qemu_driver); >>>>>>>> - >>>>>>>> /* Then inactive persistent configs */ >>>>>>>> if (virDomainObjListLoadAllConfigs(qemu_driver->domains, >>>>>>>> cfg->configDir, >>>>>>>> @@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged, >>>>>>>> NULL, NULL) < 0) >>>>>>>> goto error; >>>>>>>> >>>>>>>> + qemuProcessReconnectAll(conn, qemu_driver); >>>>>>>> >>>>>>>> virDomainObjListForEach(qemu_driver->domains, >>>>>>>> qemuDomainSnapshotLoad, >>>>>>>> >>>>>>> >>>>>>> I tried testing this patch to see if it would fix: >>>>>>> >>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1015246 >>>>>>> >>>>>>> from current master I did: >>>>>>> >>>>>>> git revert a924d9d083c215df6044387057c501d9aa338b96 >>>>>>> reproduce the bug >>>>>>> git am <your-patch> >>>>>>> >>>>>>> But the daemon won't even start up after your patch is built: >>>>>>> >>>>>>> (gdb) bt >>>>>>> #0 qemuMonitorOpen (vm=vm@entry=0x7fffd4211090, config=0x0, json=false, >>>>>>> cb=cb@entry=0x7fffddcae720 <monitorCallbacks>, >>>>>>> opaque=opaque@entry=0x7fffd419b840) at qemu/qemu_monitor.c:852 >>>> >>>>> Sorry for not being clear: The daemon crashes, that's the backtrace. >>>> >>>> Hmm config is NULL - does the state XML files not include the >>>> monitor info perhaps ? >>>> >>> >>> I see: >>> >>> pidfile for busted VM in /var/run/libvirt/qemu >>> nothing in /var/cache/libvirt/qemu >>> no state that I can see in /var/lib/libvirt/qemu >>> >>> But I'm not sure where it's supposed to be stored. >>> >>> FWIW reproducing this state was pretty simple: revert >>> a924d9d083c215df6044387057c501d9aa338b96, edit an existing x86 guest to remove >>> all <video> and <graphics> devices, start the guest, libvirtd crashes. >> >> Ok, I believe you probably have SELinux disabled on your machine or in >> libvirtd. With SELinux enabled you hit another bug first >> >> 2013-10-29 13:50:11.711+0000: 17579: error : qemuConnectMonitor:1401 : Failed to set security context for monitor for rhel6x86_64 >> >> >> which prevents hitting the crash you report. The fix is the same in both >> cases - we must skip VMs with PID of zero. I've sent a v2 patch. >> > > Hmm, selinux is permissive here but not disabled. But I'll try your patches > and report back. > Applied both patches, the original bug report and the crash I reported here are both fixed. Thanks Dan! - Cole -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list