[RFC PATCH 0/4] cgroup aware workqueues

Bandan Das <bsd@xxxxxxxxxx> · Fri, 18 Mar 2016 18:14:47 -0400

At Linuxcon last year, based on our presentation "vhost: sharing is better" [1],
we had briefly discussed the idea of cgroup aware workqueues with Tejun. The
following patches are a result of the discussion. They are in no way complete in
that the changes are for unbounded workqueues only, but I just wanted to present my
unfinished work as RFC and get some feedback.

1/4 and 3/4 are simple cgroup changes and add a helper function.
2/4 is the main implementation.
4/4 changes vhost to use workqueues with support for cgroups.

Accounting:
When servicing a userspace task A attached to cgroup X, for cgroup
awareness, a worker thread could attach to all cgroups
of the task which it is servicing. This patch does it for unbound
workqueues which means all tasks that are bound to certain cgroups
could potentially be serviced by the same worker thread. However,
the same technique could be applicable to bounded workqueues as
well.

Example:
vhost creates a worker thread when invoked for a kvm guest. Since,
the guest is a normal process, the kernel thread servicing it should be
attached to the vm process' cgroups.

Design:

The fundamental addition is a cgroup aware worker pool and as stated above,
for the unbounded case only.

These changes don't populate the "numa awareness" fields/attrs and
unlike unbounded numa worker pools, cgroup worker pools are created
on demand. Every work request could potentially have a new cgroup
aware pool created for it based on the combination of cgroups it's attached
to. However, workqueues themselves are incognizant of the actual cgroups -
they rely on the cgroups provided helper functions either for 1. a match
of all the cgroups or 2. to attach a worker thread to all cgroups of
a userspace task. We do maintain a list of cgroup aware pools so that
when a new request comes in and a suitable worker pool needs to be
found, we search the list first before creating a new one. A worker
pool also stores a a list of all "task owners" - a list of processes
that we are serving currently.

Testing:
Create some qemu processes and attaching them to different
cgroups. Verifying that new worker pools are created for tasks
that are attached to different cgroups (and reuse for the ones
that belong to the same).

Some simple performace testing using netperf below. Although, these
numbers shouldn't be dependent on these patches. The cgroup attach
and match functions are not in hot paths for general usage which
is what this test does.

Netperf:
Two guests running netperf in parallel.
    	   	       	          Without patches                  With patches

TCP_STREAM (10^6 bits/second)         975.45			     978.88	
TCP_RR (Trans/second)		      20121			     18820.82
UDP_STREAM (10^6 bits/second)	      1287.82		  	     1184.5
UDP_RR (Trans/second)		      20766.72			     19667.08
Time a 4G iso download		      2m 33 seconds		     3m 02 seconds

Todo:
What about bounded workqueues ?
What happens when cgroups of a running process changes ?
sysfs variables
Sanity check the flush and destroy path.
More extensive testing
Can we optimize the search/match/attach functions ?
Better performance numbers ? (although the onese above don't look bad)

[1] http://events.linuxfoundation.org/sites/events/files/slides/kvm_forum_2015_vhost_sharing_is_better.pdf

Bandan Das (4):
  cgroup: Introduce a function to compare two tasks
  workqueue: introduce support for attaching to cgroups
  cgroup: use spin_lock_irq for cgroup match and attach fns
  vhost: use workqueues for the works

 drivers/vhost/vhost.c       | 103 ++++++++++++++++++---
 drivers/vhost/vhost.h       |   2 +
 include/linux/cgroup.h      |   1 +
 include/linux/workqueue.h   |   2 +
 kernel/cgroup.c             |  40 ++++++++-
 kernel/workqueue.c          | 212 +++++++++++++++++++++++++++++++++++++++++---
 kernel/workqueue_internal.h |   4 +
 7 files changed, 335 insertions(+), 29 deletions(-)

-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html