On Fri, Aug 07, 2015 at 10:18:59AM +0200, Frank Schreuder wrote: > Hi Laine, > > I'm sorry for starting 3 threads with the same topic, seems like my mail was > heavily delayed yesterday. > > [ removing personal mail ] > > Op 8/7/2015 om 12:24 AM schreef Laine Stump: > >>>Subject: Reproducible live disk detach bug > >>> > >>>Hey guys, > >>> > >>>I am currently running into a reproducible libvirt bug when I detach a > >>>disk from a running VM using libvirt 1.2.18 and qemu 2.3.0. > >>> > >>>I can attach a disk to a running VM without any problems. > >>>During the attach operation I don't create a disk alias name, which > >>>results in libvirt generating an unique alias. > >>>Using virsh dumpxml I get the following output(snippet): > >>> > >>><disk type='file' device='disk'> > >>> <driver name='qemu' type='raw' cache='none'/> > >>> <source file='/var/local/mnt/test-storage/test-storage.raw'/> > >>> <backingStore/> > >>> <target dev='vde' bus='virtio'/> > >>> <iotune> > >>> <read_bytes_sec>94371840</read_bytes_sec> > >>> <write_bytes_sec>31457280</write_bytes_sec> > >>> <read_iops_sec>1000</read_iops_sec> > >>> <write_iops_sec>200</write_iops_sec> > >>> </iotune> > >>> <alias name='virtio-disk4'/> > >>> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> > >>></disk> > >>> > >>>As you can see, libvirt generates 'virtio-disk4' as an alias. But as > >>>soon as I try to detach the disk from the running VM, I will receive the > >>>following two errors in my libvirt log: > >>> > >>>2015-08-06 10:48:59.943+0000: 11657: warning : > >>>qemuMonitorJSONHandleDeviceDeleted:939 : missing device in device > >>>deleted event > >>>2015-08-06 10:48:59.945+0000: 11662: error : > >>>qemuMonitorTextDriveDel:2594 : operation failed: deleting > >>>drive-virtio-disk4 drive failed: 2015-08-06T10:48:59.945058Z Device > >>>'drive-virtio-disk4' not found > >>> > >>>Apparently libvirt tries to detach the disk with a drive- prefix, which > >>>doesn't exist? > >I can't tell you right off why the failure message is there, but the > >alias with the added "drive-" is easily explained. > > > >Each <disk> element in the libvirt config corresponds to two entities in > >the qemu virtual machine - a "device" component (what is seen by the > >guest) and a "drive" component (what resources are used on the host to > >back up that device). Each of these corresponds to a separate qemu > >commandline argument (named "-device" and "-drive"), and each of those > >must have a different, unique "id" option. For example, this is what is > >added to my qemu commandline when I add a virtio disk named > >"rawhide.qcow2" to my virtual machine: > > > >-drive > >file=/var/lib/libvirt/images/rawhide.qcow2,if=none,id=drive-virtio-disk0,format=qcow2 > >-device > >virtio-blk-pci,scsi=off,bus=pci.2,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 > > > > > >If you look at the status XML of the domain while it is running, you'll > >see that the <alias> for the disk element is "virtio-disk0", and that's > >what is given to the -device half of the disk as its qemu id. The -drive > >half uses "drive-$alias" as its qemu id though (you'll notice that the > >-device half references that id with its "drive" option - that's how the > >two halves are tied together by qemu). > > > >So, there is nothing unusual about qemu wanting to delete something > >called drive-virtio-disk4 when you have a <disk> with > >alias='virtio-disk4'. It obviously isn't right that an error message is > >being logged. Something sounds familiar about the "missing device in > >device deleted event" message, but I can't recall details. Does anyone > >else have a better memory? Frank - would it be possible for you to try > >this same sequence of events with the same libvirt and an older qemu > >(and/or vice-versa)? > > > > I have some more interesting debug information. > > I'm -not- able to reproduce it on libvirt 1.2.11 - qemu 2.3.0 and libvirt > 1.2.11 - qemu 1.7.1 using a FreeBSD VM. > I'm able to reproduce the errors with libvirt 1.2.11 - qemu 2.3.0 and > libvirt 1.2.18 - qemu 2.3.0 using a Debian VM. > Seems like the error triggers depending on the guest OS? > > During hot-unplugging of network interfaces libvirt also triggers the same > error on a Debian VM: > > 2015-08-06 10:48:59.945+0000: 11662: error : qemuMonitorJSONHandleDeviceDeleted:939 : missing device in device deleted event > > > I gathered some logging information with info level logging enabled while > detaching a disk: > http://pastebin.com/raw.php?i=vVLuavG2 The interesting this is this line: 2015-08-06 21:14:56.847+0000: 8037: info : qemuMonitorIOProcess:452 : QEMU_MONITOR_IO_PROCESS: mon=0x7f9477cc0fc0 buf={"return": "2015-08-06T21:14:56.847359Z Device 'drive-virtio-disk4' not found\r\n", "id": "libvirt-12"} Notice how the data read from the monitor command has a timestamp prefixed on it. This is very bad. We absolutely do not expect to see a timestamp on HMP monitor replies, so whatever is doing that in QEMU needs fixing. AFAIK, QEMU is only supposed to add timestamps to stuff it prints on stderr, when -msg timestamp=on. It should never timestamp HMP Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list