On Thu, Jun 11, 2015 at 04:24:18PM +0300, Andrey Korolyov wrote: > On Thu, Jun 11, 2015 at 4:13 PM, Daniel P. Berrange <berrange@xxxxxxxxxx> wrote: > > On Thu, Jun 11, 2015 at 04:06:59PM +0300, Andrey Korolyov wrote: > >> On Thu, Jun 11, 2015 at 2:33 PM, Daniel P. Berrange <berrange@xxxxxxxxxx> wrote: > >> > On Thu, Jun 11, 2015 at 02:16:50PM +0300, Andrey Korolyov wrote: > >> >> On Thu, Jun 11, 2015 at 2:09 PM, Daniel P. Berrange <berrange@xxxxxxxxxx> wrote: > >> >> > On Thu, Jun 11, 2015 at 01:50:24PM +0300, Andrey Korolyov wrote: > >> >> >> Hi Daniel, > >> >> >> > >> >> >> would it possible to adopt an optional tunable for a virCgroup > >> >> >> mechanism which targets to a disablement of a nested (per-thread) > >> >> >> cgroup creation? Those are bringing visible overhead for many-threaded > >> >> >> guest workloads, almost 5% in non-congested host CPU state, primarily > >> >> >> because the host scheduler should make a much more decisions with > >> >> >> those cgroups than without them. We also experienced a lot of host > >> >> >> lockups with currently exploited cgroup placement and disabled nested > >> >> >> behavior a couple of years ago. Though the current patch is simply > >> >> >> carves out the mentioned behavior, leaving only top-level per-machine > >> >> >> cgroups, it can serve for an upstream after some adaptation, that`s > >> >> >> why I`m asking about a chance of its acceptance. This message is a > >> >> >> kind of 'request of a feature', it either can be accepted/dropped from > >> >> >> our side or someone may give a hand and redo it from scratch. The > >> >> >> detailed benchmarks are related to a host 3.10.y, if anyone is > >> >> >> interested in the numbers for latest stable, I can update those. > >> >> > > >> >> > When you say nested cgroup creation, as you referring to the modern > >> >> > libvirt hierarchy, or the legacy hierarchy - as described here: > >> >> > > >> >> > http://libvirt.org/cgroups.html > >> >> > > >> >> > The current libvirt setup used for a year or so now is much shallower > >> >> > than previously, to the extent that we'd consider performance problems > >> >> > with it to be the job of the kernel to fix. > >> >> > >> >> Thanks, I`m referring to a 'new nested' hiearchy for an overhead > >> >> mentioned above. The host crashes I mentioned happened with old > >> >> hierarchy back ago, forgot to mention this. Despite the flattening of > >> >> the topo for the current scheme it should be possible to disable fine > >> >> group creation for the VM threads for some users who don`t need > >> >> per-vcpu cpu pinning/accounting (though overhead caused by a placement > >> >> for cpu cgroup, not by accounting/pinning ones, I`m assuming equal > >> >> distribution with such disablement for all nested-aware cgroup types), > >> >> that`s the point for now. > >> > > >> > Ok, so the per-vCPU cgroups are used for a couple of things > >> > > >> > - Setting scheduler tunables - period/quota/shares/etc > >> > - Setting CPU pinning > >> > - Setting NUMA memory pinning > >> > > >> > In addition to the per-VCPU cgroup, we have one cgroup fr each > >> > I/O thread, and also one more for general QEMU emulator threads. > >> > > >> > In the case of CPU pinning we already have automatic fallback to > >> > sched_setaffinity if the CPUSET controller isn't available. > >> > > >> > We could in theory start off without the per-vCPU/emulator/I/O > >> > cgroups and only create them as & when the feature is actually > >> > used. The concern I would have though is that changing the cgroups > >> > layout on the fly may cause unexpected sideeffects in behaviour of > >> > the VM. More critically, there would be alot of places in the code > >> > where we would need to deal with this which could hurt maintainability. > >> > > >> > How confident are you that the performance problems you see are inherant > >> > to the actual use of the cgroups, and not instead as a result of some > >> > particular bad choice of default parameters we might have left in the > >> > cgroups ? In general I'd have a desire to try to work to eliminate the > >> > perf impact before we consider the complexity of disabling this feature > >> > > >> > Regards, > >> > Daniel > >> > >> Hm, what are you proposing to begin with in a testing terms? By my > >> understanding the excessive cgroup usage along with small scheduler > >> quanta *will* lead to some overhead anyway. Let`s look at the numbers > >> which I would bring tomorrow, the mentioned five percents was catched > >> on a guest 'perf numa xxx' for a different kind of mappings and host > >> behavior (post-3.8): memory automigration on/off, kind of 'numa > >> passthrough', like grouping vcpu threads according to the host and > >> emulated guest NUMA topologies, totally scattered and unpinned threads > >> within a single and within a multiple NUMA nodes. As the result for > >> 3.10.y, there was a five-percent difference between best-performing > >> case with thread-level cpu cgroups and a 'totally scattered' case on a > >> simple mid-range two-headed node. If you think that the choice of an > >> emulated workload is wrong, please let me know, I was afraid that the > >> non-synthetic workload in the guest may suffer from a range of a side > >> factors and therefore chose perf for this task. > > > > Benchmarking isn't my area of expertize, but you should be able to just > > disable the CPUSET controller entirely in qemu.conf. If we got some > > comparative results for with & without CPUSET that'd be interesting > > place to start. If it shows clear difference, I might be able to get > > some of the Red Hat performance team to dig into what's going wrong > > in either libvirt or kernel level. > > > Thanks, let`s wait for the numbers. I mentioned cpuset only in a > matter of good-bad comparison, the main suspect for me is still the > scheduler and quotas/weights in CPU cgroup. I think we allow that controller to be disabled too with QEMU. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list