"Daniel P. Berrange" <berrange@xxxxxxxxxx> wrote: > A number of bugs conspired together to cause some nasty problems when > a QEMU vm failed to start > > - vm->monitor was not initialized to -1, so when a VM failed to start > the vm->monitor was just '0', and thus we closed FD 0 (libvirtd's stdin) > > - The next client to connect got FD 0 as its socket > > - The first bug struck again, causing the client to be closed even > though libvirt thought it was still open > > - libvirtd now polle on FD=0, which gave back POLLNVAL because it was > closed > > - event.c was not looking for POLLNVAL so it span 100% cpu when this > happened, instead of invoking the callback with an error code > > - virsh was not cleaning up the priv->watiDispatch call upon I/O errors, > so virsh then hung when doing virConenctClose It could also segfault, and it was easy to make it do that for me, every third client call. For reference, here's what I did: LIBVIRT_DEBUG=1 qemud/libvirtd > log 2>&1 & cat <<\EOF > e.xml <domain type='qemu'> <name>E</name> <uuid>d7a5fdbd-cdaf-9455-926a-d65c16db1809</uuid> <memory>219200</memory> <currentMemory>219200</currentMemory> <vcpu>2</vcpu> <os> <type arch='i686' machine='pc'>hvm</type> <boot dev='cdrom'/> </os> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type='file' device='cdrom'> <source file='NO_SUCH_FILE'/> <target dev='hdc' bus='ide'/> <readonly/> </disk> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes'/> </devices> </domain> EOF $ src/virsh create e.xml libvir: Remote error : no call waiting for reply with serial 3 error: failed to connect to the hypervisor [Exit 1] $ src/virsh create e.xml libvir: Remote error : no call waiting for reply with serial 0 error: failed to connect to the hypervisor [Exit 1] $ src/virsh create e.xml libvir: Remote error : server closed connection error: Failed to create domain from e.xml zsh: segmentation fault src/virsh create e.xml FYI, that was due to this code remote_internal.c:6319, while (tmp && tmp->next) where "tmp" is bogus because priv->waitDispatch was freed. Note that this was probably easier for me than most, since I have this in my environment: export MALLOC_PERTURB_=$(($RANDOM % 255 + 1)) > This patch does 3 things > > - Treats POLLNVAL as VIR_EVENT_HANDLE_ERROR, so the callback gets > to see the error & de-registers the client from the event loop > - Add the missing initialization of vm->monitor > - Fix remote_internal.c handling of I/O errors ACK. -- Libvir-list mailing list Libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list