Hi all. In this post I would like to bring up 3 issues which are tightly related: 1. unwanted behavior when using cfs hardlimits with libvirt, 2. Scaling cputune.share according to the number of vcpus, 3. API proposal for CFS hardlimits support. === 1 === Mark Peloquin (on cc:) has been looking at implementing CFS hard limit support on top of the existing libvirt cgroups implementation and he has run into some unwanted behavior when enabling quotas that seems to be affected by the cgroup hierarchy being used by libvirt. Here are Mark's words on the subject (posted by me while Mark joins this mailing list): ------------------ I've conducted a number of measurements using CFS. The system config is a 2 socket Nehalem system with 64GB ram. Installed is RHEL6.1-snap4. The guest VMs being used have RHEL5.5 - 32bit. I've replaced the kernel with 2.6.39-rc6+ with patches from Paul-V6-upstream-breakout.tar.bz2 for CFS bandwidth. The test config uses 5 VMs of various vcpu and memory sizes. Being used are 2 VMs with 2 vcpus and 4GB of memory, 1 VM with 4vcpus/8GB, another VM with 8vcpus/16GB and finally a VM with 16vcpus/16GB. Thus far the tests have been limited to cpu intensive workloads. Each VM runs a single instance of the workload. The workload is configured to create one thread for each vcpu in the VM. The workload is then capable of completely saturation each vcpu in each VM. CFS was tested using two different topologies. First vcpu cgroups were created under each VM created by libvirt. The vcpu threads from the VM's cgroup/tasks were moved to the tasks list of each vcpu cgroup, one thread to each vcpu cgroup. This tree structure permits setting CFS quota and period per vcpu. Default values for cpu.shares (1024), quota (-1) and period (500000us) was used in each VM cgroup and inherited by the vcpu croup. With these settings the workload generated system cpu utilization (measured in the host) of >99% guest, >0.1 idle, 0.14% user and 0.38 system. Second, using the same topology, the CFS quota in each vcpu's cgroup was set to 250000us allowing each vcpu to consume 50% of a cpu. The cpu workloads was run again. This time the total system cpu utilization was measured at 75% guest, ~24% idle, 0.15% user and 0.40% system. The topology was changed such that a cgroup for each vcpu was created in /cgroup/cpu. The first test used the default/inherited shares and CFS quota and period. The measured system cpu utilization was >99% guest, ~0.5 idle, 0.13 user and 0.38 system, similar to the default settings using vcpu cgroups under libvirt. The next test, like before the topology change, set the vcpu quota values to 250000us or 50% of a cpu. In this case the measured system cpu utilization was ~92% guest, ~7.5% idle, 0.15% user and 0.38% system. We can see that moving the vcpu cgroups from being under libvirt/qemu make a big difference in idle cpu time. Does this suggest a possible problems with libvirt? ------------------ Has anyone else seen this type of behavior when using cgroups with CFS hardlimits? We are working with the kernel community to see if there might be a bug in cgroups itself. === 2 === Something else we are seeing is that libvirt's default setting for cputune.share is 1024 for any domain (regardless of how many vcpus are configured. This ends up hindering performance of really large VMs (with lots of vcpus) as compared to smaller ones since all domains are given equal share. Would folks consider changing the default for 'shares' to be a quantity scaled by the number of vcpus such that bigger domains get to use proportionally more host cpu resource? === 3 === Besides the above issues, I would like to open a discussion on what the libvirt API for enabling cpu hardlimits should look like. Here is what I was thinking: Two additional scheduler parameters (based on the names given in the cgroup fs) will be recognized for qemu domains: 'cfs_period' and 'cfs_quota'. These can use the existing virDomain[Get|Set]SchedulerParameters() API. The Domain XML schema would be updated to permit the following: --- snip --- <cputune> ... <cfs_period>1000000</cfs_period> <cfs_quota>500000</cfs_quota> </cputune> --- snip --- To actuate these configuration settings, we simply apply the values to the appropriate cgroup(s) for the domain. We would prefer that each vcpu be in its own cgroup to ensure equal and fair scheduling across all vcpus running on the system. (We will need to resolve the issues described by Mark in order to figure out where to hang these cgroups). Thanks for sticking with me through this long email. I greatly appreciate your thoughts and comments on these topics. -- Adam Litke IBM Linux Technology Center -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list