Re: [RFC PATCH 0/4] cgroup aware workqueues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Tejun Heo <htejun@xxxxxxxxx> wrote on 03/31/2016 08:14:35 PM:
>
> Hello, Michael.
> 
> On Thu, Mar 31, 2016 at 08:17:13AM +0200, Michael Rapoport wrote:
> > > There really shouldn't be any difference when using unbound
> > > workqueues.  workqueue becomes a convenience thing which manages
> > > worker pools and there shouldn't be any difference between workqueue
> > > workers and kthreads in terms of behavior.
> > 
> > I agree that there really shouldn't be any performance difference, but 
the 
> > tests I've run show otherwise. I have no idea why and I hadn't time 
yet to 
> > investigate it.
> 
> I'd be happy to help digging into what's going on.  If kvm wants full
> control over the worker thread, kvm can use workqueue as a pure
> threadpool.  Schedule a work item to grab a worker thread with the
> matching attributes and keep using it as it'd a kthread.  While that
> wouldn't be able to take advantage of work item flushing and so on,
> it'd still be a simpler way to manage worker threads and the extra
> stuff like cgroup membership handling doesn't have to be duplicated.
> 
> > > > opportunity for optimization, at least for some workloads...
> > > 
> > > What sort of optimizations are we talking about?
> > 
> > Well, if we take Evlis (1) as for the theoretical base, there could be 

> > benefit of doing I/O scheduling inside the vhost.
> 
> Yeah, if that actually is beneficial, take full control of the
> kworker thread.

It me took a while, but at last I had time to run some benchmarks.
I've compared guest-to-guest netperf with 3 variants of vhost 
implementation:
(1) vanilla 4.4 (baseline)
(2) 4.4 + unbound workqueues based on Bandans patches [1]
(3) 4.4 + "grabbed" worker thread. This is my POC implementation that 
actually follows your proposal to take full control over the worker 
thread.

I've run two guests without any CPU pinning and without any actual 
interaction with cgroups
Here's the results (in MBits/sec):

size |   64  |   256   |  1024   |  4096   |  16384
-----+-------+---------+---------+---------+---------
(1)  | 496.8 | 1346.31 | 6058.49 | 13736.2 | 13541.4
(2)  | 493.3 | 1604.03 | 5723.68 | 10181.4 | 15572.4
(3)  | 489.7 | 1437.86 | 6251.12 | 12774.2 | 12867.9 


>From what I see, for different packet sizes there's different approach 
that outperforms the others.
Moreover, I'd expect that in case when vhost completely takes over the 
worker thread there would no be difference vs. current state.

Tejun, can you help explaining these results? 

[1] http://thread.gmane.org/gmane.linux.network/286858



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux