Re: Symptoms of main loop slowing down in libvirtd

Prerna <saxenap.ltc@xxxxxxxxx> · Tue, 2 May 2017 16:42:11 +0530

On Tue, May 2, 2017 at 4:27 PM, Peter Krempa <pkrempa@xxxxxxxxxx> wrote:
On Tue, May 02, 2017 at 16:16:39 +0530, Prerna wrote:

> On Tue, May 2, 2017 at 4:07 PM, Peter Krempa <pkrempa@xxxxxxxxxx> wrote:

>

> > On Tue, May 02, 2017 at 16:01:40 +0530, Prerna wrote:

> >

> > [please don't top-post on technical lists]

> >

> > > Thanks for the quick response Peter !

> > > This ratifies the basic approach I had in mind.

> > > It needs some (not-so-small) cleanup of the qemu driver code, and I have

> > > already started cleaning up some of it. I am planning to have a constant

> > > number of event handler threads to start with. I'll try adding this as a

> > > configurable parameter in qemu.conf once basic functionality is

> > completed.

> >

> > That is wrong, since you can't guarantee that it will not lock up. Since

> > the workers handling monitor events tend to call monitor commands

> > themselves it's possible that it will get stuck due to unresponsive

> > qemu. Without having a worst-case-scenario of a thread per VM you can't

> > guarantee that the pool won't be depleted.

> >

>

> Once a worker thread "picks" an event, it will contend on the per-VM lock

> for that VM. Consequently, the handling for that event will be delayed

> until an existing RPC call for that VM completes.

>

>

> >

> > If you want to fix this properly, you'll need a dynamic pool.

> >

>

> To improve the efficiency of the thread pool, we can try contending for a

> VM's lock for a specific time, say, N seconds, and then relinquish the

> lock. The same thread in the pool can then move on to process events of the

> next VM.

This would unnecessarily delay events which are not locked.

> Note that this needs all VMs to be hashed to a constant number of threads

> in the pool, say 5. This ensures that each worker thread has a unique ,

> non-overlapping set of VMs to work with.

How would this help?

> As an example,  {VM_ID: 1, 6,11,16,21 ..} are handled by the same worker

> thread. If this particular worker thread cannot find the requisite VM's

> lock, it will move on to the event list for the next VM and so on. The use

> of pthread_trylock() ensures that the worker thread will never be stuck

> forever.

No, I think this isn't the right approach at all. You could end up

having all VM's handled with one thread, with others being idle. I think

the right approach will be to have a dynamic pool, which will handle

incomming events. In case when two events for a single VM should be

handled in parallel, the same thread should pick them up in order they

arrived. In that way, you will have at most a thread per VM, while

normally you will have only one.

I agree that dynamic threadpool is helpful when there are events from distinct VMs that need to be processed at the same time. 
But I am also concerned about efficiently using the threads in this pool. If we have a few threads only contend on per-VM locks until the RPCs for that VM complete, it is not a very efficient use of resources. I would rather have this thread drop handling of this VM's events and do something useful while it is unable to grab this VM's lock. 
This is the reason I wanted to load-balance incoming events by VM IDs and hash them onto distinct threads. The idea was that a pthread always has something else to take up if the current Vm's lock is not available. Would you have some suggestions on improving the efficacy of the thread pool as a whole ?

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list