On 10/28/2013 07:52 AM, Daniel P. Berrange wrote: > From: "Daniel P. Berrange" <berrange@xxxxxxxxxx> > > The following sequence > > 1. Define a persistent QMEU guest s/QMEU/QEMU > 2. Start the QEMU guest > 3. Stop libvirtd > 4. Kill the QEMU process > 5. Start libvirtd > 6. List persistent guets s/guets/guests > > At the last step, the previously running persistent guest > will be missing. This is because of a race condition in the > QEMU driver startup code. It does > > 1. Load all VM state files > 2. Spawn thread to reconnect to each VM > 3. Load all VM config files > > Only at the end of step 3, does the 'virDomainObjPtr' get > marked as "persistent". There is therefore a window where > the thread reconnecting to the VM will remove the persistent > VM from the list. > > The easy fix is to simply switch the order of steps 2 & 3. > > Signed-off-by: Daniel P. Berrange <berrange@xxxxxxxxxx> > --- > src/qemu/qemu_driver.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > The fix seems reasonable, although I immediately wondered why "at some time" it was considered OK to reconnect before being persistent flag was set for inactive guests. The condition/fault initially described includes a host reboot in the processing. In that case (I would assume) the restart of the guest would occur if autostart was set. The external action of the kill()'ing of a guest outside of libvirtd's control results in some unknown/unpredictable state for the guest. Is there something in that initial load that could detect this condition better? I tried following the steps without the patch, but on my host the guest was listed after the restart - so yes a timing condition - but what causes that timing condition. Would setting the dom->persistent before the virObjectUnlock(dom) in virDomainObjListLoadAllConfigs() change the results? Beyond that keeping the virConnectOpen() and qemuProcessReconnectAll() "together" after the loading of the inactive persistent configs seems to keep code flow more normal. Whether that comes before or after the Snapshot/ManagedSave load is I suppose just an "implementation detail". Also, other drivers follow the load running, reconnect, and load inactive/persistent configs. Should those have similar patches as well? John > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c > index c613967..9c3daad 100644 > --- a/src/qemu/qemu_driver.c > +++ b/src/qemu/qemu_driver.c > @@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged, > > conn = virConnectOpen(cfg->uri); > > - qemuProcessReconnectAll(conn, qemu_driver); > - > /* Then inactive persistent configs */ > if (virDomainObjListLoadAllConfigs(qemu_driver->domains, > cfg->configDir, > @@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged, > NULL, NULL) < 0) > goto error; > > + qemuProcessReconnectAll(conn, qemu_driver); > > virDomainObjListForEach(qemu_driver->domains, > qemuDomainSnapshotLoad, > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list