From: "Daniel P. Berrange" <berrange@xxxxxxxxxx> Describe the new cgroups layout, how to customize placement of guests and what virsh commands are used to access the parameters. Signed-off-by: Daniel P. Berrange <berrange@xxxxxxxxxx> --- docs/cgroups.html.in | 285 +++++++++++++++++++++++++++++++++++++++++++++++++++ docs/sitemap.html.in | 4 + 2 files changed, 289 insertions(+) create mode 100644 docs/cgroups.html.in diff --git a/docs/cgroups.html.in b/docs/cgroups.html.in new file mode 100644 index 0000000..3be0672 --- /dev/null +++ b/docs/cgroups.html.in @@ -0,0 +1,285 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml"> + <body> + <h1>Control Groups Resource Management</h1> + + <ul id="toc"></ul> + + <p> + The QEMU and LXC drivers make use of the Linux "Control Groups" facility + for applying resource management to their virtual machines & containers. + </p> + + <h2><a name="requiredControllers">Required controllers</a></h2> + + <p> + The control groups filesystem supports multiple "controllers". By default + the init system (such as systemd) should mount all controllers compiled + into the kernel at <code>/sys/fs/cgroup/$CONTROLLER-NAME</code>. Libvirt + will never attempt to mount any controllers itself, merely detect where + they are mounted. + </p> + + <p> + The QEMU driver is capable of using the <code>cpuset</code>, + <code>cpu</code>, <code>memory</code>, <code>blkio</code> and + <code>devices</code> controllers. None of them are compulsory. + If any controller is not mounted, the resource management APIs + which use it will cease to operate. It is possible to explicitly + turn off use of a controller, even when mounted, via the + <code>/etc/libvirt/qemu.conf</code> configuration file. + </p> + + <p> + The LXC driver is capable of using the <code>cpuset</code>, + <code>cpu</code>, <code>cpuset</code>, <code>freezer</code>, + <code>memory</code>, <code>blkio</code> and <code>devices</code> + controllers. The <code>cpuset</code>, <code>devices</code> + and <code>memory</code> controllers are compulsory. Without + them mounted, no containers can be started. If any of the + other controllers are not mounted, the resource management APIs + which use them will cease to operate. + </p> + + <h2><a name="currentLayout">Current cgroups layout</a></h2> + + <p> + As of libvirt 1.0.5 or later, the cgroups layout created by libvirt has been + simplified, in order to facilitate the setup of resource control policies by + administrators / management applications. The layout is based on the concepts of + "partitions" and "consumers". Each virtual machine or container is a consumer, + and has a corresponding cgroup named <code>$VMNAME.libvirt-{qemu,lxc}</code>. + Each consumer is associated with exactly one partition, which also have a + corresponding cgroup usually named <code>$PARTNAME.partition</code>. The + exceptions to this naming rule are the three top level default partitions, + named <code>/system</code> (for system services), <code>/user</code> (for + user login sessions) and <code>/machine</code> (for virtual machines and + containers). By default every consumer will of course be associated with + the <code>/machine</code> partition. This leads to a hierarchy that looks + like + </p> + + <pre> +$ROOT + | + +- system + | | + | +- libvirtd.service + | + +- machine + | + +- vm1.libvirt-qemu + | | + | +- emulator + | +- vcpu0 + | +- vcpu1 + | + +- vm2.libvirt-qemu + | | + | +- emulator + | +- vcpu0 + | +- vcpu1 + | + +- vm3.libvirt-qemu + | | + | +- emulator + | +- vcpu0 + | +- vcpu1 + | + +- container1.libvirt-lxc + | + +- container2.libvirt-lxc + | + +- container3.libvirt-lxc + </pre> + + <p> + The default cgroups layout ensures that, when there is contention for + CPU time, it is shared equally between system services, user sessions + and virtual machines / containers. This prevents virtual machines from + locking the administrator out of the host, or impacting execution of + system services. ConverselyWhen there is no contention from + system services / user sessions, it is possible for virtual machines + to fully utilize the host CPUs. + </p> + + <h2><a name="customPartiton">Using custom partitions</a></h2> + + <p> + If there is a need to apply resource constraints to groups of + virtual machines or containers, then the single default + partition <code>/machine</code> may not be sufficiently + flexible. The administrator may wish to sub-divide the + default partition, for example into "testing" and "production" + partitions, and then assign each guest to a specific + sub-partition. This is achieved via a small element addition + to the guest domain XML config, just below the main <code>domain</code> + element + </p> + + <pre> + ... + <resource> + <partition>/machine/production</partition> + </resource> + ... + </pre> + + <p> + Libvirt will not auto-create the cgroups directory to back + this partition. In the future, libvirt / virsh will provide + APIs / commands to create custom partitions, but currently + this is left as an exercise for the administrator. For + example, given the XML config above, the admin would need + to create a cgroup named '/machine/production.partition' + </p> + + <pre> +# cd /sys/fs/cgroup +# for i in blkio cpu,cpuacct cpuset devices freezer memory net_cls perf_event + do + mkdir $i/machine/production.partition + done +# for i in cpuset.cpus cpuset.mems + do + cat cpuset/machine/$i > cpuset/machine/production.partition/$i + done +</pre> + + <p> + <strong>Note:</strong> the cgroups directory created as a ".partition" + suffix, but the XML config does not require this suffix. + </p> + + <p> + <strong>Note:</strong> the ability to place guests in custom + partitions is only available with libvirt >= 1.0.5, using + the new cgroup layout. The legacy cgroups layout described + later did not support customization per guest. + </p> + + <h2><a name="resourceAPIs">Resource management APIs/commands</a></h2> + + <p> + Since libvirt aims to provide an API which is portable across + hypervisors, the concept of cgroups is not exposed directly + in the API or XML configuration. It is considered to be an + internal implementation detail. Instead libvirt provides a + set of APIs for applying resource controls, which are then + mapped to corresponding cgroup tunables + </p> + + <h3>Scheduler tuning</h3> + + <p> + Parameters from the "cpu" controller are exposed via the + <code>schedinfo</code> command in virsh. + </p> + + <pre> +# virsh schedinfo demo +Scheduler : posix +cpu_shares : 1024 +vcpu_period : 100000 +vcpu_quota : -1 +emulator_period: 100000 +emulator_quota : -1</pre> + + + <h3>Block I/O tuning</h3> + + <p> + Parameters from the "blkio" controller are exposed via the + <code>bkliotune</code> command in virsh. + </p> + + + <pre> +# virsh blkiotune demo +weight : 500 +device_weight : </pre> + + <h3>Memory tuning</h3> + + <p> + Parameters from the "memory" controller are exposed via the + <code>memtune</code> command in virsh. + </p> + + <pre> +# virsh memtune demo +hard_limit : 580192 +soft_limit : unlimited +swap_hard_limit: unlimited + </pre> + + <h3>Network tuning</h3> + + <p> + The <code>net_cls</code> is not currently used. Instead traffic + filter policies are set directly against individual virtual + network interfaces. + </p> + + <h2><a name="legacyLayout">Legacy cgroups layout</a></h2> + + <p> + Prior to libvirt 1.0.5, the cgroups layout created by libvirt was different + from that described above, and did not allow for administrator customization. + Libvirt used a fixed, 3-level hiearchy <code>libvirt/{qemu,lxc}/$VMNAME</code> + which was rooted at the point in the hiearchy where libvirtd itself was + located. So if libvirtd was placed at <code>/system/libvirtd.service</code> + by systemd, the groups for each virtual machine / container would be located + at <code>/system/libvirtd.service/libvirt/{qemu,lxc}/$VMNAME</code>. In addition + to this, the QEMU drivers further child groups for each vCPU thread and the + emulator thread(s). This leads to a hiearchy that looked like + </p> + + + <pre> +$ROOT + | + +- system + | + +- libvirtd.service + | + +- libvirt + | + +- qemu + | | + | +- vm1 + | | | + | | +- emulator + | | +- vcpu0 + | | +- vcpu1 + | | + | +- vm2 + | | | + | | +- emulator + | | +- vcpu0 + | | +- vcpu1 + | | + | +- vm3 + | | + | +- emulator + | +- vcpu0 + | +- vcpu1 + | + +- lxc + | + +- container1 + | + +- container2 + | + +- container3 + </pre> + + <p> + Although current releases are much improved, historically the use of deep + hiearchies has had a significant negative impact on the kernel scalability. + The legacy libvirt cgroups layout highlighted these problems, to the detriment + of the performance of virtual machines and containers. + </p> + </body> +</html> diff --git a/docs/sitemap.html.in b/docs/sitemap.html.in index 619e4a1..cb7cc5b 100644 --- a/docs/sitemap.html.in +++ b/docs/sitemap.html.in @@ -89,6 +89,10 @@ <span>Ensuring exclusive guest access to disks</span> </li> <li> + <a href="cgroups.html">CGroups</a> + <span>Control groups integration</span> + </li> + <li> <a href="hooks.html">Hooks</a> <span>Hooks for system specific management</span> </li> -- 1.8.1.4 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list