Re: [BUG?] qemu was accidently killed by libvirtd after we restart libvirtd by service command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/17/2015 10:45 AM, Hailiang Zhang wrote:
Hi all,

We came across a problem, qemu was accidently killed by libvirtd after we restart libvirtd by service command.
Steps to reproduce:
1) Attach net device to VM by 'virsh attach-device' command
2) Detach the net device that added before by 'virsh detach-device' command 3) Do the above two steps repeatedly, and at the same time, we restart libvirtd daemon. And then we have chance to find that, after libvirtd is restarted at one time, qemu is killed accidently.

The libvirt log message is:
[2015-12-17 11:34:36]: libvirtd : 216046: info : libvirt version: 1.2.17
[2015-12-17 11:34:36]: libvirtd : 216046: warning : virDriverLoadModule:65 : Module /usr/lib64/libvirt/connection-driver/libvirt_driver_nwfilter.so not accessible [2015-12-17 11:34:36]: libvirtd : 216046: info : virNetlinkEventServiceStart:662 : starting netlink event service with protocol 0 [2015-12-17 11:34:36]: libvirtd : 216046: info : virNetlinkEventServiceStart:712 : set netlink socket big buffer size [2015-12-17 11:34:36]: libvirtd : 216046: info : virNetlinkEventServiceStart:662 : starting netlink event service with protocol 15 [2015-12-17 11:34:36]: libvirtd : 216046: info : virNetlinkEventServiceStart:712 : set netlink socket big buffer size [2015-12-17 11:34:36]: libvirtd : 216079: error : dnsmasqCapsRefreshInternal:741 : Cannot check dnsmasq binary /usr/sbin/dnsmasq: No such file or directory [2015-12-17 11:34:36]: libvirtd : 216079: info : networkReloadFirewallRules:1861 : Reloading iptables rules [2015-12-17 11:34:36]: libvirtd : 216079: info : networkRefreshDaemons:1830 : Refreshing network daemons [2015-12-17 11:34:36]: libvirtd : 216079: error : virNodeSuspendSupportsTarget:332 : internal error: Cannot probe for supported suspend types [2015-12-17 11:34:36]: libvirtd : 216079: warning : virQEMUCapsInit:1037 : Failed to get host power management capabilities [2015-12-17 11:34:36]: libvirtd : 216079: info : virDomainObjListLoadAllConfigs:23039 : Scanning for configs in /var/run/libvirt/qemu [2015-12-17 11:34:36]: libvirtd : 216079: info : virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName.xml' [2015-12-17 11:34:36]: libvirtd : 216079: info : virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName2.xml' [2015-12-17 11:34:36]: libvirtd : 216079: info : virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName3.xml' [2015-12-17 11:34:36]: libvirtd : 216079: info : virDomainObjListLoadAllConfigs:23039 : Scanning for configs in /etc/libvirt/qemu [2015-12-17 11:34:36]: libvirtd : 216079: info : virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName.xml' [2015-12-17 11:34:36]: libvirtd : 216079: info : virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName2.xml' [2015-12-17 11:34:36]: libvirtd : 216079: info : virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName3.xml' [2015-12-17 11:34:36]: libvirtd : 216079: info : qemuDomainSnapshotLoad:490 : Scanning for snapshots for domain VMName in /var/lib/libvirt/qemu/snapshot/VMName [2015-12-17 11:34:36]: libvirtd : 216079: info : qemuDomainSnapshotLoad:490 : Scanning for snapshots for domain VMName2 in /var/lib/libvirt/qemu/snapshot/VMName2 [2015-12-17 11:34:36]: libvirtd : 216079: info : qemuDomainSnapshotLoad:490 : Scanning for snapshots for domain VMName3 in /var/lib/libvirt/qemu/snapshot/VMName3 [2015-12-17 11:34:36]: libvirtd : 216047: info : remoteDispatchDomainDetachDevice:4105 : Domain is detached a device by client: 127.0.0.1;0 [2015-12-17 11:34:36]: libvirtd : 216050: info : remoteDispatchDomainDetachDevice:4105 : Domain is detached a device by client: 127.0.0.1;0 [2015-12-17 11:34:36]: libvirtd : 216047: info : virDomainDetachDevice:8508 : enter virDomainDetachDevice domainname=VMName2, xml= <interface type='bridge'>^M
  <source bridge='br-1'/>^M
          <mac address='00:16:3e:76:85:17'/>^M
<virtualport type='openvswitch' />^M
      <model type='virtio'/>^M
</interface>^M
^M

[2015-12-17 11:34:36]: libvirtd : 216050: info : virDomainDetachDevice:8508 : enter virDomainDetachDevice domainname=VMName, xml= <interface type='bridge'>^M
  <source bridge='br-1'/>^M
          <mac address='00:16:3e:76:85:15'/>^M
2015-12-17 11:34:36]: libvirtd : 216095: error : qemuMonitorJSONCheckError:380 : internal error: unable to execute QEMU command 'netdev_del': Device 'hostnet5' not found [2015-12-17 11:34:36]: libvirtd : 216096: error : qemuMonitorJSONCheckError:380 : internal error: unable to execute QEMU command 'netdev_del': Device 'hostnet2' not found [2015-12-17 11:34:36]: libvirtd : 216097: error : qemuMonitorJSONCheckError:380 : internal error: unable to execute QEMU command 'netdev_del': Device 'hostnet1' not found [2015-12-17 11:34:37]: libvirtd : 216095: info : virSecurityDACRestoreSecurityFileLabelInternal:325 : Restoring DAC user and group on '/data/suse11_sp3_64_1' [2015-12-17 11:34:37]: libvirtd : 216095: info : virSecurityDACSetOwnershipInternal:248 : Setting DAC user and group on '/data/suse11_sp3_64_1' to '0:0' [2015-12-17 11:34:37]: libvirtd : 216096: info : virSecurityDACRestoreSecurityFileLabelInternal:325 : Restoring DAC user and group on '/data/suse11_sp3_64_2' [2015-12-17 11:34:37]: libvirtd : 216096: info : virSecurityDACSetOwnershipInternal:248 : Setting DAC user and group on '/data/suse11_sp3_64_2' to '0:0' [2015-12-17 11:34:37]: libvirtd : 216095: info : qemuProcessStop:5720 : VM 'VMName' is dying,remove vBMC [2015-12-17 11:34:37]: libvirtd : 216095: error : rtev_vBMC_send:209 : Failed to remove domain vBMC bind path for 33ae1cb0-c451-4620-8cee-1bbddd124b82, errno is 2 [2015-12-17 11:34:37]: libvirtd : 216050: info : remoteDispatchDomainAttachDevice:2855 : Domain is attached a device by client: 127.0.0.1;0 [2015-12-17 11:34:37]: libvirtd : 216050: info : virDomainAttachDevice:8390 : enter virDomainAttachDevice domainname=VMName, xml=<interface type='bridge'>^M

The VM's qemu log is:
[2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name: device_del, qmp_cmd_arguments: {"id": "net4"} [2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status is 1 that means ACKNOWLEDGE [2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status is 1 that means ACKNOWLEDGE
[2015-12-17 11:34:30] virtio_pci_device_unplugged:966 virtio-net
[2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1450352070, "microseconds": 402473}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/net4/virtio-backend"}} [2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1450352070, "microseconds": 402586}, "event": "DEVICE_DELETED", "data": {"device": "net4", "path": "/machine/peripheral/net4"}} [2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name: netdev_del, qmp_cmd_arguments: {"id": "hostnet4"} [2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name: qom-list, qmp_cmd_arguments: {"path": "/machine/peripheral"} ---------------------------------------------- At this time, We restart libvirtd just right ------------------------------------------------------------ [2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name: device_del, qmp_cmd_arguments: {"id": "net5"} [2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status is 1 that means ACKNOWLEDGE [2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status is 1 that means ACKNOWLEDGE
[2015-12-17 11:34:30] virtio_pci_device_unplugged:966 virtio-net
[2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1450352070, "microseconds": 810030}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/net5/virtio-backend"}} [2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1450352070, "microseconds": 810150}, "event": "DEVICE_DELETED", "data": {"device": "net5", "path": "/machine/peripheral/net5"}} [2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name: netdev_del, qmp_cmd_arguments: {"id": "hostnet5"} [2015-12-17 11:34:35] send_control_event:225 virtio serial port 2 send control message event = 6, value = 0 [2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: qmp_capabilities [2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-commands [2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name: add-fd, qmp_cmd_arguments: {"fdset-id": 0, "opaque": "/dev/null"} [2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name: block-commit, qmp_cmd_arguments: {"device": "bogus"}
[2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-events
[2015-12-17 11:34:36] send_control_event:225 virtio serial port 2 send control message event = 6, value = 1
[2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-status
[2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-block
[2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-chardev
[2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name: qom-list, qmp_cmd_arguments: {"path": "/machine/peripheral"} [2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name: netdev_del, qmp_cmd_arguments: {"id": "hostnet5"}
[2015-12-17 11:34:36]: shutting down
[2015-12-17 11:34:36] send_control_event:225 virtio serial port 2 send control message event = 6, value = 0
qemu: terminating on signal 15 from pid 216046(libvirtd)
[2015-12-17 11:34:36] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1450352076, "microseconds": 267427}, "event": "SHUTDOWN"} [2015-12-17 11:34:36] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1450352076, "microseconds": 267773}, "event": "DEVICE_TRAY_MOVED", "data": {"device": "drive-ide0-0-1", "tray-open": true}}

We have analyzed this problem, and thought it was triggered by the inconsistent state for the net device
between qemu and the new libvirtd.
After libvirtd is started, it will connect to qemu, and updates the devices state by command 'qom-list',
and also it will read the configure information from XML stored.
The codes call process is:
qemuProcessReconnect
  ->qemuProcessUpdateDevices
->if (qemuDomainUpdateDeviceList(driver, vm, QEMU_ASYNC_JOB_NONE) < 0)
       ->if ((tmp = old)) {
        while (*tmp) {
            if (!virStringArrayHasString(priv->qemuDevices, *tmp) &&
virDomainDefFindDevice(vm->def, *tmp, &dev, false) == 0 &&
                qemuDomainRemoveDevice(driver, vm, &dev) < 0) {
goto cleanup; --->If we fail here, it will return -1; and it will call qemuProcessStop() in qemuProcessReconnect to kill VM.
            }
            tmp++;
        }
    }
It is a little hard to reproduce this problem, we can reproduce it in a special way:
1) Attach a net device by virsh command
2) Stop libvirtd
3) Detach the net device by directly send qmp command to QEMU though the monitor, just like:
# ./qmp-shell /var/lib/libvirt/qemu/redhat.monitor
Welcome to the QMP low-level shell!
Connected to QEMU 2.3.0

(QEMU) qom-list path=/machine/peripheral
{u'return': [{u'type': u'child<virtio-net-pci>', u'name': u'net1'}, {u'type': u'child<cirrus-vga>', u'name': u'video0'}, {u'type': u'child<usb-tablet>', u'name': u'input0'}, {u'type': u'child<isa-serial>', u'name': u'serial0'}, {u'type': u'child<virtio-net-pci>', u'name': u'net0'}, {u'type': u'child<scsi-hd>', u'name': u'scsi0-0-0-0'}, {u'type': u'child<virtio-scsi-pci>', u'name': u'scsi0'}, {u'type': u'child<piix3-usb-uhci>', u'name': u'usb'}, {u'type': u'string', u'name': u'type'}]}
(QEMU) device_del id=net1
{u'return': {}}
(QEMU) netdev_del id=hostnet1
{u'return': {}}
4) Restart libvirtd daemon and then you will found that qemu is killed by libvirtd.

The main reason for the inconsistent state is, we accidently restarted libvirtd before it finished treating
'detach device' work.
It seems a little violent to kill such VM directly.
Can we just do the unfinished cleanup work for the VM after libvirtd is starting ? (IMHO, we can't ensure not to restart libvirtd while it is treating the detach device work.)

Thanks for the detailed debugging and analysis!

While I don't think that libvirt should be required to deal properly with someone going into the qmp shell and detaching a device while libvirtd is down, it *should* properly handle the case where libvirtd is restarted while a device detach is in progress (I had thought that worked properly, but confess I've never actually tried it).

So that we don't lose track of it, can you file a bug for this in the libvirt upstream bugzilla:

  https://libvirt.org/bugs.html

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]