Seems that we have following execution flow: 1. Something calls function static void qemuMonitorIO(int watch, int fd, int events, void *opaque) which has following piece of code: qemu_monitor.c: 599 if (!error && 600 events & VIR_EVENT_HANDLE_HANGUP) { 601 virReportError(VIR_ERR_INTERNAL_ERROR, "%s", 602 _("End of file from monitor")); 603 eof = 1; 604 events &= ~VIR_EVENT_HANDLE_HANGUP; 605 } Someway we have VIR_EVENT_HANDLE_HANGUP set when come to the mentioned piece of code, and mark monitor file as eof-ed. 2. Few lines below we coming up to the following instructions: qemu_monitor.c: 649 if (eof) { 650 void (*eofNotify)(qemuMonitorPtr, virDomainObjPtr) 651 = mon->cb->eofNotify; 652 virDomainObjPtr vm = mon->vm; 653 654 /* Make sure anyone waiting wakes up now */ 655 virCondSignal(&mon->notify); 656 virObjectUnlock(mon); 657 virObjectUnref(mon); 658 VIR_DEBUG("Triggering EOF callback"); 659 (eofNotify)(mon, vm); 660 } and running eofNotify callback, which leads us to: 3. qemu_process.c: 292 static void 293 qemuProcessHandleMonitorEOF(qemuMonitorPtr mon ATTRIBUTE_UNUSED, 294 virDomainObjPtr vm) 295 { 296 virQEMUDriverPtr driver = qemu_driver; 297 virDomainEventPtr event = NULL; 298 qemuDomainObjPrivatePtr priv; 299 int eventReason = VIR_DOMAIN_EVENT_STOPPED_SHUTDOWN; 300 int stopReason = VIR_DOMAIN_SHUTOFF_SHUTDOWN; 301 const char *auditReason = "shutdown"; 302 303 VIR_DEBUG("Received EOF on %p '%s'", vm, vm->def->name); 304 305 qemuDriverLock(driver); 306 virObjectLock(vm); 307 308 priv = vm->privateData; 309 310 if (priv->beingDestroyed) { 311 VIR_DEBUG("Domain is being destroyed, EOF is expected"); 312 goto unlock; 313 } 314 315 if (!virDomainObjIsActive(vm)) { 316 VIR_DEBUG("Domain %p is not active, ignoring EOF", vm); 317 goto unlock; 318 } 319 320 if (priv->monJSON && !priv->gotShutdown) { 321 VIR_DEBUG("Monitor connection to '%s' closed without SHUTDOWN event; " 322 "assuming the domain crashed", vm->def->name); 323 eventReason = VIR_DOMAIN_EVENT_STOPPED_FAILED; 324 stopReason = VIR_DOMAIN_SHUTOFF_CRASHED; 325 auditReason = "failed"; 326 } 327 328 event = virDomainEventNewFromObj(vm, 329 VIR_DOMAIN_EVENT_STOPPED, 330 eventReason); 331 qemuProcessStop(driver, vm, stopReason, 0); 332 virDomainAuditStop(vm, auditReason); 333 334 if (!vm->persistent) { 335 qemuDomainRemoveInactive(driver, vm); 336 goto cleanup; 337 } 338 339 unlock: 340 virObjectUnlock(vm); 341 342 cleanup: 343 if (event) 344 qemuDomainEventQueue(driver, event); 345 qemuDriverUnlock(driver); 346 } … where we mistakenly stop domain by the following call: 331 qemuProcessStop(driver, vm, stopReason, 0); This flow seems to be the only possible. So the question is how running domain may generate VIR_EVENT_HANDLE_HANGUP event. May be there some race condition or some other reasons causing that state on libvirt startup? Thank you for help. 02.03.2013, в 13:35, Igor Lukyanov <igor@xxxxxxxxxxxx> написал(а): > Hello. > We notice a very upsetting behavior of how libvirtd deals with running domains after being restarted. > Namely, in 10-20s after start libvirtd may accidentally kill a random number of running (qemu) domains with SIGTERM. > > Here is what we get in libvirt log: > > 2013-02-27 12:35:10.415+0000: 45777: info : libvirt version: 1.0.2 > 2013-02-27 12:35:10.415+0000: 45777: error : virNWFilterSnoopLeaseFileRefresh:1903 : open("/var/run/libvirt/network/nwfilter.ltmp"): No such file or directory > 2013-02-27 12:35:21.772+0000: 45766: error : qemuMonitorIO:602 : internal error End of file from monitor > 2013-02-27 12:35:21.772+0000: 45766: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet34" not in key map > 2013-02-27 12:35:21.905+0000: 45766: error : virNetDevGetIndex:653 : Unable to get index for interface vnet34: No such device > 2013-02-27 12:35:22.029+0000: 45766: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet35" not in key map > 2013-02-27 12:35:22.032+0000: 45766: error : virNetDevGetIndex:653 : Unable to get index for interface vnet35: No such device > 2013-02-27 12:35:22.430+0000: 45766: error : qemuMonitorIO:602 : internal error End of file from monitor > 2013-02-27 12:35:22.430+0000: 45766: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet4" not in key map > 2013-02-27 12:35:22.433+0000: 45766: error : virNetDevGetIndex:653 : Unable to get index for interface vnet4: No such device > 2013-02-27 12:35:22.554+0000: 45766: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet5" not in key map > 2013-02-27 12:35:22.557+0000: 45766: error : virNetDevGetIndex:653 : Unable to get index for interface vnet5: No such device > 2013-02-27 12:35:22.794+0000: 45766: error : qemuMonitorIO:602 : internal error End of file from monitor > 2013-02-27 12:35:22.794+0000: 45766: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet10" not in key map > 2013-02-27 12:35:22.797+0000: 45766: error : virNetDevGetIndex:653 : Unable to get index for interface vnet10: No such device > 2013-02-27 12:35:22.944+0000: 45766: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet11" not in key map > 2013-02-27 12:35:22.947+0000: 45766: error : virNetDevGetIndex:653 : Unable to get index for interface vnet11: No such device > 2013-02-27 12:35:23.194+0000: 45766: error : qemuMonitorIO:602 : internal error End of file from monitor > 2013-02-27 12:35:23.194+0000: 45766: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet14" not in key map > 2013-02-27 12:35:23.197+0000: 45766: error : virNetDevGetIndex:653 : Unable to get index for interface vnet14: No such device > 2013-02-27 12:35:23.361+0000: 45766: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet15" not in key map > 2013-02-27 12:35:23.364+0000: 45766: error : virNetDevGetIndex:653 : Unable to get index for interface vnet15: No such device > > > And this is what we get by grepping 45766 in separate logs of vms: > > drafter@10-5-0-11:/var/log/libvirt/qemu$ sudo grep -A 1 45766 * > vm010-001-206-002.log:qemu: terminating on signal 15 from pid 45766 > vm010-001-206-002.log-2013-02-27 12:35:22.430+0000: shutting down > -- > vm010-001-207-002.log:qemu: terminating on signal 15 from pid 45766 > vm010-001-207-002.log-2013-02-27 12:35:22.794+0000: shutting down > -- > vm010-001-228-006.log:qemu: terminating on signal 15 from pid 45766 > vm010-001-228-006.log-2013-02-27 12:35:23.194+0000: shutting down > -- > vm010-002-011-002.log:qemu: terminating on signal 15 from pid 45766 > vm010-002-011-002.log-2013-02-27 12:35:21.772+0000: shutting down > > Specifically here libvirt killed 4 vms just after start. > Before that occasion we just shut libvirtd off and started again (stop/start). > > Will be very grateful for any ideas on this matter. > -- > wbr, Igor Lukyanov -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list