On 09/13/2018 10:11 PM, wang.yechao255@xxxxxxxxxx wrote: > I just code review, found there may be problem. > > The follow statement in founction qemuProcessReconnectHelper: > > "if (virThreadCreate(&thread, false, qemuProcessReconnect, data) < 0) " > > may be failed (no one can guarantee 'virThreadCreate' always success). Not sure how, but your response isn't threaded with your original posting. Makes it difficult to follow. Posting a few bugs related to qemuProcessReconnectHelper - makes me wonder "how" you get yourself into this situation. Is it like the libnuma bug and you're creating scripts that have frequent and continual libvirtd restarts, which I noted there isn't not really a normal situation. In any case, I think in this situation what we want is to call virDomainObjListRemoveLocked instead of virDomainObjListRemove from qemuDomainRemoveInactive because all the prerequisites for calling the Locked function are true (the obj is locked/reffed and the list lock is held). Now this is *not* to say qemuDomainRemoveInactive should always call the *Locked variant. It means adding a "bool useLocked" to the API and then passing true *only* from qemuProcessReconnectHelper and then of course calling the virDomainObjListRemoveLocked variant when the bool is true. Requires changing all the other callers to pass false. Another preferred option: Patch 1. Split up qemuDomainRemoveInactive, such that code from "cfg = virQEMUDriverGetConfig(driver);" through/including "qemuExtDevicesCleanupHost(driver, vm->def);" plus the "virObjectUnref(cfg);" is in a static helper "qemuDomainRemoveInactiveCommon". Patch 2. Create a qemuDomainRemoveInactiveLocked which is a copy of qemuDomainRemoveInactive except that instead of calling virDomainObjListRemove it calls virDomainObjListRemoveLocked. Patch 3. Create a qemuDomainRemoveInactiveJobLocked which copies qemuDomainRemoveInactiveJob except of course calling virDomainObjListRemoveLocked. Patch 4. Modify qemuProcessReconnectHelper to call qemuDomainRemoveInactiveJobLocked. John > > if ‘virThreadCreate’ failed, the follow backstrace we will get: > > > #0 0x00007fa89921203e in pthread_rwlock_wrlock () from > /lib64/libpthread.so.0 > > #1 0x00007fa89ba218e5 in virRWLockWrite (m=<optimized out>) at > util/virthread.c:122 > > #2 0x00007fa89b9f9ebb in virObjectRWLockWrite (anyobj=<optimized out>) > at util/virobject.c:487 > > #3 0x00007fa89ba82a68 in virDomainObjListRemove (doms=0x7fa87411fde0, > dom=0x7fa8740f94f0) at conf/virdomainobjlist.c:400 > > #4 0x00007fa87e1b9ace in qemuDomainRemoveInactive > (driver=driver@entry=0x7fa87411aa20, vm=vm@entry=0x7fa8740f94f0) at > qemu/qemu_domain.c:8309 > > #5 0x00007fa87e1b9c02 in qemuDomainRemoveInactiveJob > (driver=0x7fa87411aa20, vm=0x7fa8740f94f0) at qemu/qemu_domain.c:8331 > > #6 0x00007fa87e1ef36d in qemuProcessReconnectHelper > (obj=0x7fa8740f94f0, opaque=0x7fa87b4b3c30) at qemu/qemu_process.c:8035 > > #7 0x00007fa89ba81e9a in virDomainObjListHelper (payload=<optimized > out>, name=<optimized out>, opaque=0x7fa87b4b3c00) at > conf/virdomainobjlist.c:804 > > #8 0x00007fa89b9ccaa0 in virHashForEach (table=0x7fa87410e520, > iter=iter@entry=0x7fa89ba81e90 <virDomainObjListHelper>, > data=data@entry=0x7fa87b4b3c00) > > at util/virhash.c:580 > > #9 0x00007fa89ba83391 in virDomainObjListForEach (doms=0x7fa87411fde0, > callback=callback@entry=0x7fa87e1ef220 <qemuProcessReconnectHelper>, > > opaque=opaque@entry=0x7fa87b4b3c30) at conf/virdomainobjlist.c:819 > > #10 0x00007fa87e1f1564 in qemuProcessReconnectAll (driver=<optimized > out>) at qemu/qemu_process.c:8056 > > #11 0x00007fa87e227928 in qemuStateInitialize (privileged=true, > callback=<optimized out>, opaque=<optimized out>) at qemu/qemu_driver.c:919 > > #12 0x00007fa89bb9f91f in virStateInitialize (privileged=true, > callback=callback@entry=0x7fa89c547cd0 <daemonInhibitCallback>, > opaque=opaque@entry=0x7fa89d875c00) > > at libvirt.c:662 > > #13 0x00007fa89c547d2b in daemonRunStateInit (opaque=0x7fa89d875c00) at > remote/remote_daemon.c:803 > > #14 0x00007fa89ba21712 in virThreadHelper (data=<optimized out>) at > util/virthread.c:206 > > #15 0x00007fa89920edc5 in start_thread () from /lib64/libpthread.so.0 > > #16 0x00007fa898b3673d in clone () from /lib64/libc.so.6 > > > frame 8, virHashForEach has called virObjectLock(doms) > > frame 3, virDomainObjListRemove calls virObjectRWLockWrite(doms) again. > > thus deadlock occurs. > > > > 原始邮件 > *发件人:*PeterKrempa <pkrempa@xxxxxxxxxx> > *收件人:*王业超10154425; > *抄送人:*libvir-list@xxxxxxxxxx <libvir-list@xxxxxxxxxx> > *日 期 :*2018年09月13日 19:31 > *主 题 :**Re: [PATCH v2] qemu: fix deadlock if > createqemuProcessReconnect thread failed* > On Thu, Sep 13, 2018 at 19:28:12 +0800, Wang Yechao wrote: >> qemuProcessReconnectHelper has hold the doms lock, if create >> qemuProcessReconnect thread failed, it will get the doms lock >> again to remove the dom from doms list. >> >> add obj->inReconnetCtx flag to avoid deadlock. > > Please describe the situation more or provide a reproducer. > >> >> Signed-off-by: Wang Yechao <wang.yechao255@xxxxxxxxxx> >> --- >> src/conf/domain_conf.h | 1 + >> src/conf/virdomainobjlist.c | 6 ++++-- >> src/qemu/qemu_process.c | 1 + >> 3 files changed, 6 insertions(+), 2 deletions(-) > > > > > -- > libvir-list mailing list > libvir-list@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/libvir-list > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list