Re: Overhead for a default cpu cg placement scheme

"Daniel P. Berrange" <berrange@xxxxxxxxxx> · Thu, 11 Jun 2015 14:30:24 +0100

On Thu, Jun 11, 2015 at 04:24:18PM +0300, Andrey Korolyov wrote:
> On Thu, Jun 11, 2015 at 4:13 PM, Daniel P. Berrange <berrange@xxxxxxxxxx> wrote:
> > On Thu, Jun 11, 2015 at 04:06:59PM +0300, Andrey Korolyov wrote:
> >> On Thu, Jun 11, 2015 at 2:33 PM, Daniel P. Berrange <berrange@xxxxxxxxxx> wrote:
> >> > On Thu, Jun 11, 2015 at 02:16:50PM +0300, Andrey Korolyov wrote:
> >> >> On Thu, Jun 11, 2015 at 2:09 PM, Daniel P. Berrange <berrange@xxxxxxxxxx> wrote:
> >> >> > On Thu, Jun 11, 2015 at 01:50:24PM +0300, Andrey Korolyov wrote:
> >> >> >> Hi Daniel,
> >> >> >>
> >> >> >> would it possible to adopt an optional tunable for a virCgroup
> >> >> >> mechanism which targets to a disablement of a nested (per-thread)
> >> >> >> cgroup creation? Those are bringing visible overhead for many-threaded
> >> >> >> guest workloads, almost 5% in non-congested host CPU state, primarily
> >> >> >> because the host scheduler should make a much more decisions with
> >> >> >> those cgroups than without them. We also experienced a lot of host
> >> >> >> lockups with currently exploited cgroup placement and disabled nested
> >> >> >> behavior a couple of years ago. Though the current patch is simply
> >> >> >> carves out the mentioned behavior, leaving only top-level per-machine
> >> >> >> cgroups, it can serve for an upstream after some adaptation, that`s
> >> >> >> why I`m asking about a chance of its acceptance. This message is a
> >> >> >> kind of 'request of a feature', it either can be accepted/dropped from
> >> >> >> our side or someone may give a hand and redo it from scratch. The
> >> >> >> detailed benchmarks are related to a host 3.10.y, if anyone is
> >> >> >> interested in the numbers for latest stable, I can update those.
> >> >> >
> >> >> > When you say nested cgroup creation, as you referring to the modern
> >> >> > libvirt hierarchy, or the legacy hierarchy - as described here:
> >> >> >
> >> >> >   http://libvirt.org/cgroups.html
> >> >> >
> >> >> > The current libvirt setup used for a year or so now is much shallower
> >> >> > than previously, to the extent that we'd consider performance problems
> >> >> > with it to be the job of the kernel to fix.
> >> >>
> >> >> Thanks, I`m referring to a 'new nested' hiearchy for an overhead
> >> >> mentioned above. The host crashes I mentioned happened with old
> >> >> hierarchy back ago, forgot to mention this. Despite the flattening of
> >> >> the topo for the current scheme it should be possible to disable fine
> >> >> group creation for the VM threads for some users who don`t need
> >> >> per-vcpu cpu pinning/accounting (though overhead caused by a placement
> >> >> for cpu cgroup, not by accounting/pinning ones, I`m assuming equal
> >> >> distribution with such disablement for all nested-aware cgroup types),
> >> >> that`s the point for now.
> >> >
> >> > Ok, so the per-vCPU cgroups are used for a couple of things
> >> >
> >> >  - Setting scheduler tunables - period/quota/shares/etc
> >> >  - Setting CPU pinning
> >> >  - Setting NUMA memory pinning
> >> >
> >> > In addition to the per-VCPU cgroup, we have one cgroup fr each
> >> > I/O thread, and also one more for general QEMU emulator threads.
> >> >
> >> > In the case of CPU pinning we already have automatic fallback to
> >> > sched_setaffinity if the CPUSET controller isn't available.
> >> >
> >> > We could in theory start off without the per-vCPU/emulator/I/O
> >> > cgroups and only create them as & when the feature is actually
> >> > used. The concern I would have though is that changing the cgroups
> >> > layout on the fly may cause unexpected sideeffects in behaviour of
> >> > the VM. More critically, there would be alot of places in the code
> >> > where we would need to deal with this which could hurt maintainability.
> >> >
> >> > How confident are you that the performance problems you see are inherant
> >> > to the actual use of the cgroups, and not instead as a result of some
> >> > particular bad choice of default parameters we might have left in the
> >> > cgroups ?  In general I'd have a desire to try to work to eliminate the
> >> > perf impact before we consider the complexity of disabling this feature
> >> >
> >> > Regards,
> >> > Daniel
> >>
> >> Hm, what are you proposing to begin with in a testing terms? By my
> >> understanding the excessive cgroup usage along with small scheduler
> >> quanta *will* lead to some overhead anyway. Let`s look at the numbers
> >> which I would bring tomorrow, the mentioned five percents was catched
> >> on a guest 'perf numa xxx' for a different kind of mappings and host
> >> behavior (post-3.8): memory automigration on/off, kind of 'numa
> >> passthrough', like grouping vcpu threads according to the host and
> >> emulated guest NUMA topologies, totally scattered and unpinned threads
> >> within a single and within a multiple NUMA nodes. As the result for
> >> 3.10.y, there was a five-percent difference between best-performing
> >> case with thread-level cpu cgroups and a 'totally scattered' case on a
> >> simple mid-range two-headed node. If you think that the choice of an
> >> emulated workload is wrong, please let me know, I was afraid that the
> >> non-synthetic workload in the guest may suffer from a range of a side
> >> factors and therefore chose perf for this task.
> >
> > Benchmarking isn't my area of expertize, but you should be able to just
> > disable the CPUSET controller entirely in qemu.conf. If we got some
> > comparative results for with & without CPUSET that'd be interesting
> > place to start. If it shows clear difference, I might be able to get
> > some of the Red Hat performance team to dig into what's going wrong
> > in either libvirt or kernel level.
> >
> Thanks, let`s wait for the numbers. I mentioned cpuset only in a
> matter of good-bad comparison, the main suspect for me is still the
> scheduler and quotas/weights in CPU cgroup.

I think we allow that controller to be disabled too with QEMU.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list