At 06/09/2011 03:20 AM, Adam Litke Write: > Hi all. In this post I would like to bring up 3 issues which are > tightly related: 1. unwanted behavior when using cfs hardlimits with > libvirt, 2. Scaling cputune.share according to the number of vcpus, 3. > API proposal for CFS hardlimits support. > > > === 1 === > Mark Peloquin (on cc:) has been looking at implementing CFS hard limit > support on top of the existing libvirt cgroups implementation and he has > run into some unwanted behavior when enabling quotas that seems to be > affected by the cgroup hierarchy being used by libvirt. > > Here are Mark's words on the subject (posted by me while Mark joins this > mailing list): > ------------------ > I've conducted a number of measurements using CFS. > > The system config is a 2 socket Nehalem system with 64GB ram. Installed > is RHEL6.1-snap4. The guest VMs being used have RHEL5.5 - 32bit. I've > replaced the kernel with 2.6.39-rc6+ with patches from > Paul-V6-upstream-breakout.tar.bz2 for CFS bandwidth. The test config > uses 5 VMs of various vcpu and memory sizes. Being used are 2 VMs with 2 > vcpus and 4GB of memory, 1 VM with 4vcpus/8GB, another VM with > 8vcpus/16GB and finally a VM with 16vcpus/16GB. > > Thus far the tests have been limited to cpu intensive workloads. Each VM > runs a single instance of the workload. The workload is configured to > create one thread for each vcpu in the VM. The workload is then capable > of completely saturation each vcpu in each VM. > > CFS was tested using two different topologies. > > First vcpu cgroups were created under each VM created by libvirt. The > vcpu threads from the VM's cgroup/tasks were moved to the tasks list of > each vcpu cgroup, one thread to each vcpu cgroup. This tree structure > permits setting CFS quota and period per vcpu. Default values for > cpu.shares (1024), quota (-1) and period (500000us) was used in each VM > cgroup and inherited by the vcpu croup. With these settings the workload > generated system cpu utilization (measured in the host) of >99% guest, >> 0.1 idle, 0.14% user and 0.38 system. > > Second, using the same topology, the CFS quota in each vcpu's cgroup was > set to 250000us allowing each vcpu to consume 50% of a cpu. The cpu > workloads was run again. This time the total system cpu utilization was > measured at 75% guest, ~24% idle, 0.15% user and 0.40% system. > > The topology was changed such that a cgroup for each vcpu was created in > /cgroup/cpu. > > The first test used the default/inherited shares and CFS quota and > period. The measured system cpu utilization was >99% guest, ~0.5 idle, > 0.13 user and 0.38 system, similar to the default settings using vcpu > cgroups under libvirt. > > The next test, like before the topology change, set the vcpu quota > values to 250000us or 50% of a cpu. In this case the measured system cpu > utilization was ~92% guest, ~7.5% idle, 0.15% user and 0.38% system. > > We can see that moving the vcpu cgroups from being under libvirt/qemu > make a big difference in idle cpu time. > > Does this suggest a possible problems with libvirt? I do not think it is a problem in libvirt. Libvirt only uses the interface provided by cgroup system. It may a problem in cgroup or CFS bandwidth. > ------------------ > > Has anyone else seen this type of behavior when using cgroups with CFS > hardlimits? We are working with the kernel community to see if there > might be a bug in cgroups itself. > > > === 2 === > Something else we are seeing is that libvirt's default setting for > cputune.share is 1024 for any domain (regardless of how many vcpus are > configured. This ends up hindering performance of really large VMs > (with lots of vcpus) as compared to smaller ones since all domains are > given equal share. Would folks consider changing the default for > 'shares' to be a quantity scaled by the number of vcpus such that bigger > domains get to use proportionally more host cpu resource? The value 1024 is a default value in kernel, not libvirt. If you want to change cputune.share, you should edit the xml config file. > > > === 3 === > Besides the above issues, I would like to open a discussion on what the > libvirt API for enabling cpu hardlimits should look like. Here is what > I was thinking: I need this feature immediately after CFS bandwidth patchset is merged into upsteam kernel. So I am working on this recently. > > Two additional scheduler parameters (based on the names given in the > cgroup fs) will be recognized for qemu domains: 'cfs_period' and > 'cfs_quota'. These can use the existing > virDomain[Get|Set]SchedulerParameters() API. The Domain XML schema > would be updated to permit the following: > > --- snip --- > <cputune> > ... > <cfs_period>1000000</cfs_period> > <cfs_quota>500000</cfs_quota> > </cputune> > --- snip --- > > To actuate these configuration settings, we simply apply the values to > the appropriate cgroup(s) for the domain. We would prefer that each > vcpu be in its own cgroup to ensure equal and fair scheduling across all > vcpus running on the system. (We will need to resolve the issues > described by Mark in order to figure out where to hang these cgroups). each vcpu in its own cgroup? Do you mean each vcpu has a seperate thread? AFAIK, qemu does not create thread for each vcpu. Thanks. Wen Congyang > > > > Thanks for sticking with me through this long email. I greatly > appreciate your thoughts and comments on these topics. > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list