At 06/10/2011 05:20 PM, Daniel P. Berrange Write: > On Wed, Jun 08, 2011 at 02:20:23PM -0500, Adam Litke wrote: >> Hi all. In this post I would like to bring up 3 issues which are >> tightly related: 1. unwanted behavior when using cfs hardlimits with >> libvirt, 2. Scaling cputune.share according to the number of vcpus, 3. >> API proposal for CFS hardlimits support. >> >> >> === 1 === >> Mark Peloquin (on cc:) has been looking at implementing CFS hard limit >> support on top of the existing libvirt cgroups implementation and he has >> run into some unwanted behavior when enabling quotas that seems to be >> affected by the cgroup hierarchy being used by libvirt. >> >> Here are Mark's words on the subject (posted by me while Mark joins this >> mailing list): >> ------------------ >> I've conducted a number of measurements using CFS. >> >> The system config is a 2 socket Nehalem system with 64GB ram. Installed >> is RHEL6.1-snap4. The guest VMs being used have RHEL5.5 - 32bit. I've >> replaced the kernel with 2.6.39-rc6+ with patches from >> Paul-V6-upstream-breakout.tar.bz2 for CFS bandwidth. The test config >> uses 5 VMs of various vcpu and memory sizes. Being used are 2 VMs with 2 >> vcpus and 4GB of memory, 1 VM with 4vcpus/8GB, another VM with >> 8vcpus/16GB and finally a VM with 16vcpus/16GB. >> >> Thus far the tests have been limited to cpu intensive workloads. Each VM >> runs a single instance of the workload. The workload is configured to >> create one thread for each vcpu in the VM. The workload is then capable >> of completely saturation each vcpu in each VM. >> >> CFS was tested using two different topologies. >> >> First vcpu cgroups were created under each VM created by libvirt. The >> vcpu threads from the VM's cgroup/tasks were moved to the tasks list of >> each vcpu cgroup, one thread to each vcpu cgroup. This tree structure >> permits setting CFS quota and period per vcpu. Default values for >> cpu.shares (1024), quota (-1) and period (500000us) was used in each VM >> cgroup and inherited by the vcpu croup. With these settings the workload >> generated system cpu utilization (measured in the host) of >99% guest, >>> 0.1 idle, 0.14% user and 0.38 system. >> >> Second, using the same topology, the CFS quota in each vcpu's cgroup was >> set to 250000us allowing each vcpu to consume 50% of a cpu. The cpu >> workloads was run again. This time the total system cpu utilization was >> measured at 75% guest, ~24% idle, 0.15% user and 0.40% system. >> >> The topology was changed such that a cgroup for each vcpu was created in >> /cgroup/cpu. >> >> The first test used the default/inherited shares and CFS quota and >> period. The measured system cpu utilization was >99% guest, ~0.5 idle, >> 0.13 user and 0.38 system, similar to the default settings using vcpu >> cgroups under libvirt. >> >> The next test, like before the topology change, set the vcpu quota >> values to 250000us or 50% of a cpu. In this case the measured system cpu >> utilization was ~92% guest, ~7.5% idle, 0.15% user and 0.38% system. >> >> We can see that moving the vcpu cgroups from being under libvirt/qemu >> make a big difference in idle cpu time. >> >> Does this suggest a possible problems with libvirt? >> ------------------ > > I can't really understand from your description what the different > setups are. You're talking about libvirt vcpu cgroups, but nothing > in libvirt does vcpu based cgroups, our cgroup granularity is always > per-VM. > >> === 2 === >> Something else we are seeing is that libvirt's default setting for >> cputune.share is 1024 for any domain (regardless of how many vcpus are >> configured. This ends up hindering performance of really large VMs >> (with lots of vcpus) as compared to smaller ones since all domains are >> given equal share. Would folks consider changing the default for >> 'shares' to be a quantity scaled by the number of vcpus such that bigger >> domains get to use proportionally more host cpu resource? > > Well that's just the kernel default setting actually. The intent > of the default cgroups configuration for a VM, is that it should > be identical to the configuration if the VM was *not* in any > cgroups. So I think that gives some justification for setting > the cpu shares relative to the # of vCPUs by default, otherwise > we have a regression vs not using cgroups. > >> === 3 === >> Besides the above issues, I would like to open a discussion on what the >> libvirt API for enabling cpu hardlimits should look like. Here is what >> I was thinking: >> >> Two additional scheduler parameters (based on the names given in the >> cgroup fs) will be recognized for qemu domains: 'cfs_period' and >> 'cfs_quota'. These can use the existing >> virDomain[Get|Set]SchedulerParameters() API. The Domain XML schema >> would be updated to permit the following: >> >> --- snip --- >> <cputune> >> ... >> <cfs_period>1000000</cfs_period> >> <cfs_quota>500000</cfs_quota> >> </cputune> >> --- snip --- > > I don't think 'cfs_' should be in the names here. These absolute > limits on CPU time could easily be applicable to non-CFS schedulars > or non-Linux hypervisors. Do you mean the element's name should be period and quota? The name of the file provided by cfs bandwidth is:cpu.cfs_period_us and cpu.cfs_quota_us. I think he uses 'cfs_' because it's similar as the filename. But I do not mind the element's name. I am making the patch, so I want to know which element's name should be used. > >> To actuate these configuration settings, we simply apply the values to >> the appropriate cgroup(s) for the domain. We would prefer that each >> vcpu be in its own cgroup to ensure equal and fair scheduling across all >> vcpus running on the system. (We will need to resolve the issues >> described by Mark in order to figure out where to hang these cgroups). > > The reason for putting VMs in cgroups is that, because KVM is multithreaded, > using Cgroups is the only way to control settings of the VM as a whole. If > you just want to control individual VCPU settings, then that can be done > without cgroups just be setting the process' schedpriority via the normal > APIs. Creating cgroups at the granularity of individual vCPUs is somewhat > troublesome, because if the administrator has mounted other cgroups > controllers at the same location as the 'cpu' controller, then putting > each VCPU in a separate cgroup will negatively impact other aspects of > the VM. Also KVM has a number of other non-VCPU threads which consume a > non-trivial amount of CPU time, which often come & go over time. So IMHO > the smallest cgroup granularity should remain per-VM. > > > Daniel -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list