From: "Daniel P. Berrange" <berrange@xxxxxxxxxx> As of libvirt 1.1.1 and systemd 205, the cgroups layout used by libvirt has some changes. Update the 'cgroups.html' file from the website to describe how it works in a systemd world. Signed-off-by: Daniel P. Berrange <berrange@xxxxxxxxxx> --- docs/cgroups.html.in | 212 +++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 172 insertions(+), 40 deletions(-) diff --git a/docs/cgroups.html.in b/docs/cgroups.html.in index 77656b2..46cfb7b 100644 --- a/docs/cgroups.html.in +++ b/docs/cgroups.html.in @@ -47,17 +47,121 @@ <p> As of libvirt 1.0.5 or later, the cgroups layout created by libvirt has been simplified, in order to facilitate the setup of resource control policies by - administrators / management applications. The layout is based on the concepts of - "partitions" and "consumers". Each virtual machine or container is a consumer, - and has a corresponding cgroup named <code>$VMNAME.libvirt-{qemu,lxc}</code>. - Each consumer is associated with exactly one partition, which also have a - corresponding cgroup usually named <code>$PARTNAME.partition</code>. The - exceptions to this naming rule are the three top level default partitions, - named <code>/system</code> (for system services), <code>/user</code> (for - user login sessions) and <code>/machine</code> (for virtual machines and - containers). By default every consumer will of course be associated with - the <code>/machine</code> partition. This leads to a hierarchy that looks - like + administrators / management applications. The new layout is based on the concepts + of "partitions" and "consumers". A "consumer" is a cgroup which holds the + processes for a single virtual machine or container. A "partition" is a cgroup + which does not contain any processes, but can have resource controls applied. + A "partition" will have zero or more child directories which may be either + "consumer" or "partition". + </p> + + <p> + As of libvirt 1.1.1 or later, the cgroups layout will have some slight + differences when running on a host with systemd 205 or later. The overall + tree structure is the same, but there are some differences in the naming + conventions for the cgroup directories. Thus the following docs split + in two, one describing systemd hosts and the other non-systemd hosts. + </p> + + <h3><a name="currentLayoutSystemd">Systemd cgroups integration</a></h3> + + <p> + On hosts which use systemd, each consumer maps to a systemd scope unit, + while partitions map to a system slice unit. + </p> + + <h4><a name="systemdScope">Systemd scope naming</a></h4> + + <p> + The systemd convention is for the scope name of virtual machines / containers + to be of the general format <code>machine-$NAME.scope</code>. Libvirt forms the + <code>$NAME</code> part of this by concatenating the driver type with the name + of the guest, and then escaping any systemd reserved characters. + So for a guest <code>demo</code> running under the <code>lxc</code> driver, + we get a <code>$NAME</code> of <code>lxc-demo</code> which when escaped is + <code>lxc\x2ddemo</code>. So the complete scope name is <code>machine-lxc\x2ddemo.scope</code>. + The scope names map directly to the cgroup directory names. + </p> + + <h4><a name="systemdSlice">Systemd slice naming</a></h4> + + <p> + The systemd convention for slice naming is that a slice should include the + name of all of its parents prepended on its own name. So for a libvirt + partition <code>/machine/engineering/testing</code>, the slice name will + be <code>machine-engineering-testing.slice</code>. Again the slice names + map directly to the cgroup directory names. Systemd creates three top level + slices by default, <code>system.slice</code> <code>user.slice</code> and + <code>machine.slice</code>. All virtual machines or containers created + by libvirt will be associated with <code>machine.slice</code> by default. + </p> + + <h4><a name="systemdLayout">Systemd cgroup layout</a></h4> + + <p> + Given this, an possible systemd cgroups layout involing 3 qemu guests, + 3 lxc containers and 3 custom child slices, would be: + </p> + + <pre> +$ROOT + | + +- system.slice + | | + | +- libvirtd.service + | + +- machine.slice + | + +- machine-qemu\x2dvm1.scope + | | + | +- emulator + | +- vcpu0 + | +- vcpu1 + | + +- machine-qemu\x2dvm2.scope + | | + | +- emulator + | +- vcpu0 + | +- vcpu1 + | + +- machine-qemu\x2dvm3.scope + | | + | +- emulator + | +- vcpu0 + | +- vcpu1 + | + +- machine-engineering.slice + | | + | +- machine-engineering-testing.slice + | | | + | | +- machine-lxc\x2dcontainer1.scope + | | + | +- machine-engineering-production.slice + | | + | +- machine-lxc\x2dcontainer2.scope + | + +- machine-marketing.slice + | + +- machine-lxc\x2dcontainer3.scope + </pre> + + <h3><a name="currentLayoutGeneric">Non-systemd cgroups layout</a></h3> + + <p> + On hosts which do not use systemd, each consumer has a corresponding cgroup + named <code>$VMNAME.libvirt-{qemu,lxc}</code>. Each consumer is associated + with exactly one partition, which also have a corresponding cgroup usually + named <code>$PARTNAME.partition</code>. The exceptions to this naming rule + are the three top level default partitions, named <code>/system</code> (for + system services), <code>/user</code> (for user login sessions) and + <code>/machine</code> (for virtual machines and containers). By default + every consumer will of course be associated with the <code>/machine</code> + partition. This leads to a hierarchy that looks like: + </p> + + <p> + Given this, an possible systemd cgroups layout involing 3 qemu guests, + 3 lxc containers and 2 custom child slices, would be: </p> <pre> @@ -87,23 +191,21 @@ $ROOT | +- vcpu0 | +- vcpu1 | - +- container1.libvirt-lxc - | - +- container2.libvirt-lxc + +- engineering.partition + | | + | +- testing.partition + | | | + | | +- container1.libvirt-lxc + | | + | +- production.partition + | | + | +- container2.libvirt-lxc | - +- container3.libvirt-lxc + +- marketing.partition + | + +- container3.libvirt-lxc </pre> - <p> - The default cgroups layout ensures that, when there is contention for - CPU time, it is shared equally between system services, user sessions - and virtual machines / containers. This prevents virtual machines from - locking the administrator out of the host, or impacting execution of - system services. Conversely, when there is no contention from - system services / user sessions, it is possible for virtual machines - to fully utilize the host CPUs. - </p> - <h2><a name="customPartiton">Using custom partitions</a></h2> <p> @@ -127,12 +229,54 @@ $ROOT </pre> <p> + Note that the partition names in the guest XML are using a + generic naming format, not the the low level naming convention + required by the underlying host OS. ie you should not include + any of the <code>.partition</code> or <code>.slice</code> + suffixes in the XML config. Given a partition name + <code>/machine/production</code>, libvirt will automatically + apply the platform specific translation required to get + <code>/machine/production.partition</code> (non-systemd) + or <code>/machine.slice/machine-prodution.slice</code> + (systemd) as the underlying cgroup name + </p> + + <p> Libvirt will not auto-create the cgroups directory to back this partition. In the future, libvirt / virsh will provide APIs / commands to create custom partitions, but currently - this is left as an exercise for the administrator. For - example, given the XML config above, the admin would need - to create a cgroup named '/machine/production.partition' + this is left as an exercise for the administrator. + </p> + + <p> + <strong>Note:</strong> the ability to place guests in custom + partitions is only available with libvirt >= 1.0.5, using + the new cgroup layout. The legacy cgroups layout described + later in this document did not support customization per guest. + </p> + + <h3><a name="createSystemd">Creating custom partitions (systemd)</a></h3> + + <p> + Given the XML config above, the admin on a systemd based host would + need to create a unit file <code>/etc/systemd/system/machine-production.slice</code> + </p> + + <pre> +# cat > /etc/systemd/system/machine-testing.slice <<EOF +[Unit] +Description=VM testing slice +Before=slices.target +Wants=machine.slice +EOF +# systemctl start machine-testing.slice + </pre> + + <h3><a name="createNonSystemd">Creating custom partitions (non-systemd)</a></h3> + + <p> + Given the XML config above, the admin on a non-systemd based host + would need to create a cgroup named '/machine/production.partition' </p> <pre> @@ -147,18 +291,6 @@ $ROOT done </pre> - <p> - <strong>Note:</strong> the cgroups directory created as a ".partition" - suffix, but the XML config does not require this suffix. - </p> - - <p> - <strong>Note:</strong> the ability to place guests in custom - partitions is only available with libvirt >= 1.0.5, using - the new cgroup layout. The legacy cgroups layout described - later did not support customization per guest. - </p> - <h2><a name="resourceAPIs">Resource management APIs/commands</a></h2> <p> -- 1.8.3.1 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list