From: "Daniel P. Berrange" <berrange@xxxxxxxxxx> The past 24 hours have seen a flurry of libvirtd crash reports from Fedora users. https://bugzilla.redhat.com/show_bug.cgi?id=1014933 In one thread we have the libvirtd daemon startup code running, and it is in the middle of QEMU state initialization. #9 0xb00882e4 in qemuStateInitialize (privileged=true, callback=0xb77a0420 <daemonInhibitCallback>, opaque=0xb8b1fc98) at qemu/qemu_driver.c:595 driverConf = 0xaf5afcd8 "/etc/libvirt/qemu.conf" conn = 0x0 ebuf = "\000\260\025\267\024\071P\257\214\000\000\000\360\316\341\257\335\242\023\267\214\000\000\000\210\177X\257\001\000\000\000l\000\000\000\360\316\341\257\000\260\025\267\264\316\341\257\210\177X\257$\316\341\257$\316\341\257l\000\000\000\304\316\341\257\201\321LRl\000\000\000\235R\022\267\000)\233\351\260\316\341\257\000\000\000\000\253G\022\267\000\260\025\267\340\316\341\257\a\000\000\000\v\260\023\267\000\260\025\267\001\000\000\000\254\325\334\266\000\260\025\267\214\261\023\267 :P\257\037:P\257\000\000\000\000/\261\023\267\000\260\025\267uc\334\266\000\260\025\267A\262\023\267\037:P\257\000\000\000\000\001\000\000\000\000\000\000\000\340\316\341\257\334\316\341\257\001\000\000\000\001\000\000\000\033c\024\267"... membase = 0x0 mempath = 0x0 cfg = 0xaf509050 run_uid = 4294967295 run_gid = 4294967295 __func__ = "qemuStateInitialize" __FUNCTION__ = "qemuStateInitialize" #10 0xb74c5325 in virStateInitialize (privileged=true, callback=callback@entry=0xb77a0420 <daemonInhibitCallback>, opaque=opaque@entry=0xb8b1fc98) at libvirt.c:833 i = 6 __func__ = "virStateInitialize" #11 0xb77a049e in daemonRunStateInit (opaque=opaque@entry=0xb8b1fc98) at libvirtd.c:876 srv = 0xb8b1fc98 __func__ = "daemonRunStateInit" In another thread, we have a dbus event being handled by the nwfilter driver, and the nwfilter driver calls into the QEMU driver....which has not finished initializing itself yet! Thread 1 (Thread 0xb6366ac0 (LWP 7041)): #0 0xb0052861 in virQEMUCloseCallbacksGetForConn (closeCallbacks=0x0, conn=0xb8b2cc20) at qemu/qemu_conf.c:861 list = 0xb8ac57e8 data = {conn = 0xb8b2cc20, list = 0xb8ac57e8, oom = false} #1 virQEMUCloseCallbacksRun (closeCallbacks=0x0, conn=conn@entry=0xb8b2cc20, driver=0xaf50b350) at qemu/qemu_conf.c:890 list = 0xb8b2cc20 i = <optimized out> __func__ = "virQEMUCloseCallbacksRun" #2 0xb009df3b in qemuConnectClose (conn=0xb8b2cc20) at qemu/qemu_driver.c:1057 driver = <optimized out> #3 0xb74babc1 in virConnectDispose (obj=0xb8b2cc20) at datatypes.c:159 conn = 0xb8b2cc20 #4 0xb742f22c in virObjectUnref (anyobj=anyobj@entry=0xb8b2cc20) at util/virobject.c:264 klass = 0xb8b2cba0 obj = 0xb8b2cc20 lastRef = true __func__ = "virObjectUnref" #5 0xb74c5811 in virConnectClose (conn=conn@entry=0xb8b2cc20) at libvirt.c:1503 __func__ = "virConnectClose" __FUNCTION__ = "virConnectClose" #6 0xb023424e in nwfilterStateReload () at nwfilter/nwfilter_driver.c:301 conn = 0xb8b2cc20 #7 0xb02342fc in nwfilterFirewalldDBusFilter (connection=0xaf501038, message=0xaf503910, user_data=0x0) at nwfilter/nwfilter_driver.c:90 __func__ = "nwfilterFirewalldDBusFilter" #8 0xb711efb9 in dbus_connection_dispatch (connection=0xaf501038) at dbus-connection.c:4631 filter = <optimized out> next = 0x0 message = 0xaf503910 link = <optimized out> filter_list_copy = 0xaf5009dc message_link = 0xaf500a18 result = DBUS_HANDLER_RESULT_NOT_YET_HANDLED pending = <optimized out> reply_serial = <optimized out> status = <optimized out> found_object = 3071507249 __FUNCTION__ = "dbus_connection_dispatch" #9 0xb740caeb in virDBusWatchCallback (fdatch=fdatch@entry=8, fd=15, events=1, opaque=0xaf500ca8) at util/virdbus.c:144 watch = 0xaf500ca8 info = 0xaf500de0 dbus_flags = 1 This DBus event is triggered when the firewalld driver is reloaded, or restarted. I confirmed this analysis by adding a sleep(10) to the QEMU driver startup code, and then triggering a firewalld restart. Sure enough it crashed & burned with the same trace. The reason why it has suddenly hit us is that we are unlucky enough to have a firewalld update in Fedora repos at the same time as a libvirt update, and lots of people are pulling both updates down in one yum transaction! After wasting time figuring out how to avoid the race condition with mutexes and other synchronization ideas, I realized that the nwfilter code was in fact bogus. The only reason it gets a virConnectPtr is so that the code for reloading filters can access its nwfilterPrivateData field to get the virNWFilterDriverStatePtr object instance. This is insanely convoluted, since the nwfilter driver can trivially pass the driver state instance into the virNWFilterConfLayerInit method at startup. Thus these patches just rip out all use of virConnectPtr from the nwfilter driver code, thus avoiding the race with the QEMU driver initialization code. This also fixes the nwfilter driver in cases where the QEMU driver is disabled, but LXC driver still wants to use nwfilter. Daniel P. Berrange (3): Remove virConnectPtr arg from virNWFilterDefParse* Don't pass virConnectPtr in nwfilter 'struct domUpdateCBStruct' Remove use of virConnectPtr from all remaining nwfilter code src/conf/nwfilter_conf.c | 78 ++++++++++++++++------------------ src/conf/nwfilter_conf.h | 24 ++++------- src/lxc/lxc_driver.c | 3 +- src/nwfilter/nwfilter_dhcpsnoop.c | 12 +++--- src/nwfilter/nwfilter_driver.c | 49 +++++++++------------ src/nwfilter/nwfilter_gentech_driver.c | 32 +++++++------- src/nwfilter/nwfilter_gentech_driver.h | 10 ++--- src/nwfilter/nwfilter_learnipaddr.c | 6 +-- src/qemu/qemu_driver.c | 6 ++- src/uml/uml_driver.c | 3 +- tests/nwfilterxml2xmltest.c | 2 +- 11 files changed, 102 insertions(+), 123 deletions(-) -- 1.8.3.1 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list