Re: [libvirt PATCH 0/4] src: add configurable support for cgroups usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 20, 2020 at 01:40:10PM +0000, Daniel P. Berrangé wrote:
> This simple series allows apps to choose how cgroups are managed by
> libvirt, between cgroupfs and machined, or disabled entirely.

I'm not so sure about this series.  The situation with cgroups and
systemd is a bit more complex then the current code handles.

There is an existing issue where we are violating the delegation rules
described by cgroups and systemd.

Currently the "cgroupfs" approach is used only on non-systemd hosts and
we should keep it that way.  Libvirt is not allowed to mangle with
cgroups owned by systemd so IMHO the warning in configuration file is
not enough because without delegation cgroups will not work properly in
libvirt.

One example is that without delegation the VM cgroups would not have any
controllers enabled by default.

Having an option to completely disable cgroups is OK but that's the only
thing I would expose for now.

When using machined the current topology looks like this (only the files
that we use are listed):

/sys/fs/cgroup/machine.slice/
└── machine-qemu\x2d1\x2dcentos8.scope
    ├── emulator
    │   ├── cgroup.controllers
    │   ├── cgroup.threads
    │   ├── cgroup.type
    │   ├── cpuset.cpus
    │   └── cpuset.mems
    ├── vcpu0
    │   ├── cgroup.controllers
    │   ├── cgroup.threads
    │   ├── cgroup.type
    │   ├── cpuset.cpus
    │   └── cpuset.mems
    ├── vcpu1
    │   ├── cgroup.controllers
    │   ├── cgroup.threads
    │   ├── cgroup.type
    │   ├── cpuset.cpus
    │   └── cpuset.mems
    ├── vcpu2
    │   ├── cgroup.controllers
    │   ├── cgroup.threads
    │   ├── cgroup.type
    │   ├── cpuset.cpus
    │   └── cpuset.mems
    ├── vcpu3
    │   ├── cgroup.controllers
    │   ├── cgroup.threads
    │   ├── cgroup.type
    │   ├── cpuset.cpus
    │   └── cpuset.mems
    ├── cgroup.controllers
    ├── cgroup.procs
    ├── cgroup.subtree_control
    ├── cgroup.type
    ├── cpu.max
    ├── cpu.stat
    ├── cpu.weight
    ├── io.bfq.weight
    ├── io.max
    ├── io.stat
    ├── io.weight
    ├── memory.high
    ├── memory.max
    ├── memory.stat
    └── memory.swap.max

which is incorrect.  The group "machine-qemu\x2d1\x2dcentos8.scope" is
marked as delegated which means we can create new sub-cgroups there and
that we can access only these files directly in that group:

    cgroup.subtree_control - to enable cgroup controllers for our
                             sub-cgroups

    cgroup.procs - to move processes around into our sub-cgroups

Other files are off limit and we should not touch them.  In addition
systemd may create its own sub-cgroups and it would fail to do so
because there is another limitation from kernel that processes can live
only in the leaves except for the root cgroup.

Exactly that is happening right now and it was discovered by user
reporting a BUG that systemctl daemon-reloaed changed the values in
cpu.shares because it owns that file and it was our fault to change it.

Currently we use these cgroups files:

    cgroup.type

        - to enable threaded mode that is required for vcpu, emulator 
          and iothread pinning.  We should not do that in the delegated
          cgroup because we may break systemd, this has to be moved to
          a sub-cgroup together with the pinning.

    io.weight / io.bfq.weight, cpu.weight

        - These cannot be moved into the sub-cgroup because it would
          lose the effect and would be completely pointless to set it.
          All the *.weight files works in a way that the resources
          available to parent are distributed using the weight to the
          children.  If we would move it to the sub-cgroup there would
          be only single child all the time.  Therefore we have to use
          D-Bus to talk to systemd to have these values configured
          directly for the delegated cgroup.

    io.max, memory.max, memory.high, memory.swap.max, cpu.max

        - All of these can be safely set in the sub-cgroup that we will
          have to create as they are absolute limits and they are not
          affected by siblings.

    cpuset.mems, cpuset.cpus

        - These are used together with the threaded model and can also
          be safely set in the sub-cgroup.

Based on all of the above this is the new topology that has to be
created by libvirt if systemd is present:

/sys/fs/cgroup/machine.slice/
└── machine-qemu\x2d1\x2dcentos8.scope
    ├── libvirt-vm (or something else)
    │   ├── emulator
    │   │   ├── cgroup.controllers
    │   │   ├── cgroup.threads
    │   │   ├── cgroup.type
    │   │   ├── cpuset.cpus
    │   │   └── cpuset.mems
    │   ├── vcpu0
    │   │   ├── cgroup.controllers
    │   │   ├── cgroup.threads
    │   │   ├── cgroup.type
    │   │   ├── cpuset.cpus
    │   │   └── cpuset.mems
    │   ├── vcpu1
    │   │   ├── cgroup.controllers
    │   │   ├── cgroup.threads
    │   │   ├── cgroup.type
    │   │   ├── cpuset.cpus
    │   │   └── cpuset.mems
    │   ├── vcpu2
    │   │   ├── cgroup.controllers
    │   │   ├── cgroup.threads
    │   │   ├── cgroup.type
    │   │   ├── cpuset.cpus
    │   │   └── cpuset.mems
    │   ├── vcpu3
    │   │   ├── cgroup.controllers
    │   │   ├── cgroup.threads
    │   │   ├── cgroup.type
    │   │   ├── cpuset.cpus
    │   │   └── cpuset.mems
    │   ├── cgroup.controllers
    │   ├── cgroup.procs
    │   ├── cgroup.subtree_control
    │   ├── cgroup.type
    │   ├── cpu.max
    │   ├── cpu.stat
    │   ├── io.max
    │   ├── io.stat
    │   ├── memory.high
    │   ├── memory.max
    │   ├── memory.stat
    │   └── memory.swap.max
    ├── cgroup.procs
    ├── cgroup.subtree_control
    ├── cpu.weight          (only via systemd using D-Bus)
    ├── io.bfq.weight       (only via systemd using D-Bus)
    └── io.weight           (only via systemd using D-Bus)

Pavel

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux