On Mon, Oct 28, 2013 at 09:17:16AM -0400, John Ferlan wrote: > On 10/28/2013 07:52 AM, Daniel P. Berrange wrote: > > From: "Daniel P. Berrange" <berrange@xxxxxxxxxx> > > > > The following sequence > > > > 1. Define a persistent QMEU guest > > s/QMEU/QEMU > > > 2. Start the QEMU guest > > 3. Stop libvirtd > > 4. Kill the QEMU process > > 5. Start libvirtd > > 6. List persistent guets > > s/guets/guests > > > > > At the last step, the previously running persistent guest > > will be missing. This is because of a race condition in the > > QEMU driver startup code. It does > > > > 1. Load all VM state files > > 2. Spawn thread to reconnect to each VM > > 3. Load all VM config files > > > > Only at the end of step 3, does the 'virDomainObjPtr' get > > marked as "persistent". There is therefore a window where > > the thread reconnecting to the VM will remove the persistent > > VM from the list. > > > > The easy fix is to simply switch the order of steps 2 & 3. > > > > Signed-off-by: Daniel P. Berrange <berrange@xxxxxxxxxx> > > --- > > src/qemu/qemu_driver.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > The fix seems reasonable, although I immediately wondered why "at some > time" it was considered OK to reconnect before being persistent flag was > set for inactive guests. The condition/fault initially described > includes a host reboot in the processing. In that case (I would assume) > the restart of the guest would occur if autostart was set. The external > action of the kill()'ing of a guest outside of libvirtd's control > results in some unknown/unpredictable state for the guest. Is there > something in that initial load that could detect this condition better? > I tried following the steps without the patch, but on my host the guest > was listed after the restart - so yes a timing condition - but what > causes that timing condition. > > Would setting the dom->persistent before the virObjectUnlock(dom) in > virDomainObjListLoadAllConfigs() change the results? No, I tried that and it didn't help. None the less that should be fixed separately. > Beyond that keeping the virConnectOpen() and qemuProcessReconnectAll() > "together" after the loading of the inactive persistent configs seems to > keep code flow more normal. Whether that comes before or after the > Snapshot/ManagedSave load is I suppose just an "implementation detail". > > Also, other drivers follow the load running, reconnect, and load > inactive/persistent configs. Should those have similar patches as well? Historically it was safe because reconnecting to QEMU was serialized. At some point we switched to using a separate thread for reconnecting to QEMU which introduced a race. The other drivers are still serialized IIUC. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list