Cced Danp. I'm sorry for the unconvenience, but my mail server occasionally cuts my CC list for unknown reason. On Fri, Jun 22, 2012 at 11:26:03AM +0800, Hu Tao wrote: > On Thu, Jun 21, 2012 at 09:56:54AM -0600, Jim Fehlig wrote: > > Daniel P. Berrange wrote: > > > On Wed, Jun 20, 2012 at 03:07:16PM -0600, Jim Fehlig wrote: > > > > > >> I'm looking into a libvirtd deadlock on daemon shutdown. The deadlock > > >> occurs when shutting down virNetServer. If the handling of a job is in > > >> flight in virNetServerHandleJob(), the virNetServer lock is acquired > > >> when freeing job->prog (src/rpc/virnetserver.c:167). But the lock is > > >> already held in virNetServerFree(), which is blocked in > > >> virThreadPoolFree() waiting for all the workers to finish. No progress > > >> can be made. > > >> > > >> The attached hack fixes the problem, but I'm not convinced this is an > > >> appropriate fix. Is it necessary to hold the virNetServer lock when > > >> calling virNetServerProgramFree(job->prog)? I notice the lock is not > > >> held in the error path of virNetServerHandleJob(). > > >> > > >> Thanks, > > >> Jim > > >> > > >> > > > > > > > > >> diff --git a/src/rpc/virnetserver.c b/src/rpc/virnetserver.c > > >> index ae19e84..edd3196 100644 > > >> --- a/src/rpc/virnetserver.c > > >> +++ b/src/rpc/virnetserver.c > > >> @@ -774,7 +774,9 @@ void virNetServerFree(virNetServerPtr srv) > > >> for (i = 0 ; i < srv->nservices ; i++) > > >> virNetServerServiceToggle(srv->services[i], false); > > >> > > >> + virNetServerUnlock(srv); > > >> virThreadPoolFree(srv->workers); > > >> + virNetServerLock(srv); > > >> > > >> for (i = 0 ; i < srv->nsignals ; i++) { > > >> sigaction(srv->signals[i]->signum, &srv->signals[i]->oldaction, NULL); > > >> > > > > > > The virNetServerPtr object is ref-counted, so technical;ly > > > only decrementing the ref count needs to be protected. Once > > > the ref count hits 0, then only the current thread (and the > > > workers) should be using the virNetServerPtr instance. > > > > > > > Ah, right. Thanks for clarifying. > > > > > So, yes, it is safe to call virNetServerUnlock before > > > virThreadPoolFree. Furthermore, it is /not/ required > > > to call virNetServerLock afterwards - no other thread > > > should be using it now the workers are dead. So you > > > can skip that extra lock call, and also remove the > > > unlock call much later > > > > > > > Something like the attached patch? > > > > Regards, > > Jim > > > > > >From 583be33213e922899b23f036494886397b2549dc Mon Sep 17 00:00:00 2001 > > From: Jim Fehlig <jfehlig@xxxxxxxx> > > Date: Thu, 21 Jun 2012 09:21:44 -0600 > > Subject: [PATCH] Fix deadlock on libvirtd shutdown > > > > When shutting down libvirtd, the virNetServer shutdown can deadlock > > if there are in-flight jobs being handled by virNetServerHandleJob(). > > virNetServerFree() will acquire the virNetServer lock and call > > virThreadPoolFree() to terminate the workers, waiting for the workers > > to finish. But in-flight workers will attempt to acquire the > > virNetServer lock, resulting in deadlock. > > > > Fix the deadlock by unlocking the virNetServer lock before calling > > virThreadPoolFree(). This is safe since the virNetServerPtr object > > is ref-counted and only decrementing the ref count needs to be > > protected. Additionally, there is no need to re-acquire the lock > > after virThreadPoolFree() completes as all the workers have > > terminated. > > --- > > src/rpc/virnetserver.c | 6 ++---- > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > diff --git a/src/rpc/virnetserver.c b/src/rpc/virnetserver.c > > index ae19e84..9d71e53 100644 > > --- a/src/rpc/virnetserver.c > > +++ b/src/rpc/virnetserver.c > > @@ -766,10 +766,9 @@ void virNetServerFree(virNetServerPtr srv) > > virNetServerLock(srv); > > VIR_DEBUG("srv=%p refs=%d", srv, srv->refs); > > srv->refs--; > > - if (srv->refs > 0) { > > - virNetServerUnlock(srv); > > + virNetServerUnlock(srv); > > At this point other threads may have changed srv->refs... > > > + if (srv->refs > 0) > > ...so it's unsafe to test srv->refs here without locking. > > For example, assume srv->refs is 2 at the beginning of virNetServerFree, > > thread A thread B > > lock srv lock srv (blocked) > > dec srv->refs > > (srv->refs is 1) > > unlock srv > > lock srv > > dec srv->refs > > (srv->refs is 0) > > unlock src > > test srv->refs test srv->refs > > > In this case both threads have found srv->refs is 0 and are going to > free srv... > > Following patch fixes problem. > > >From 25aa97e05aa76054b781a4e5e83781ee16d5afee Mon Sep 17 00:00:00 2001 > From: Hu Tao <hutao@xxxxxxxxxxxxxx> > Date: Fri, 22 Jun 2012 11:15:01 +0800 > Subject: [PATCH] fix a bug of ref count in virnetserver.c > > The test of ref count is not protected by lock, which is unsafe because > the ref count may have been changed by other threads during the test. > > This patch fixes this. > --- > src/rpc/virnetserver.c | 5 +++-- > 1 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/src/rpc/virnetserver.c b/src/rpc/virnetserver.c > index 9d71e53..247ddd7 100644 > --- a/src/rpc/virnetserver.c > +++ b/src/rpc/virnetserver.c > @@ -759,15 +759,16 @@ void virNetServerQuit(virNetServerPtr srv) > void virNetServerFree(virNetServerPtr srv) > { > int i; > + int refs; > > if (!srv) > return; > > virNetServerLock(srv); > VIR_DEBUG("srv=%p refs=%d", srv, srv->refs); > - srv->refs--; > + refs = --srv->refs; > virNetServerUnlock(srv); > - if (srv->refs > 0) > + if (refs > 0) > return; > > for (i = 0 ; i < srv->nservices ; i++) > -- > 1.7.4.4 > > > -- > Thanks, > Hu Tao > > -- > libvir-list mailing list > libvir-list@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/libvir-list -- Thanks, Hu Tao -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list