On Fri, Mar 20, 2020 at 05:42:57PM +0100, Pavel Hrdina wrote: > On Fri, Mar 20, 2020 at 03:59:41PM +0000, Daniel P. Berrangé wrote: > > On Fri, Mar 20, 2020 at 04:48:58PM +0100, Pavel Hrdina wrote: > > > On Fri, Mar 20, 2020 at 03:39:58PM +0000, Daniel P. Berrangé wrote: > > > > On Fri, Mar 20, 2020 at 04:30:07PM +0100, Pavel Hrdina wrote: > > > > > On Fri, Mar 20, 2020 at 01:40:10PM +0000, Daniel P. Berrangé wrote: > > > > > > This simple series allows apps to choose how cgroups are managed by > > > > > > libvirt, between cgroupfs and machined, or disabled entirely. > > > > > > > > > > I'm not so sure about this series. The situation with cgroups and > > > > > systemd is a bit more complex then the current code handles. > > > > > > > > > > There is an existing issue where we are violating the delegation rules > > > > > described by cgroups and systemd. > > > > > > > > > > Currently the "cgroupfs" approach is used only on non-systemd hosts and > > > > > we should keep it that way. Libvirt is not allowed to mangle with > > > > > cgroups owned by systemd so IMHO the warning in configuration file is > > > > > not enough because without delegation cgroups will not work properly in > > > > > libvirt. > > > > > > > > That isn't the case currently AFAICT from current code. > > > > > > > > Before this series, the virCgroupNewMachine method will first try > > > > systemd and then fallback to directly cgroupfs. > > > > > > > > This fallback can happen on a systemd host, when machined is not > > > > installed, as machined is an optional component. > > > > > > Right, I should have checked the code. > > > > > > > If we need to mandate use of systemd on systemd hosts, then our > > > > existing code is broken and needs fixing. > > > > > > I think we should do that because without delegation we should not touch > > > anything in cgroups. > > > > > > > I'm happy todo such a fix, and then adjust this series to take > > > > account of it. Essentially we'd allow apps to specify 'cgroupfs' > > > > or 'machined' but we'd enforce they only make safe choices. > > > > > > > > ie on a systemd host we'd only allow 'none' or 'machined' > > > > > > > > On a non-systemd host, or on a systemd host where we've > > > > been delegated a subtree, we'd only allow 'none' or 'cgroupfs' > > > > > > This sounds reasonable, I was thinking about the check if we have > > > delegated subtree but decided not to mention it. The only place where > > > we can check the delegation is using systemd API, probably over D-Bus. > > > > > > It's not reflected anywhere in the cgroup files. > > > > The key scenario for cgroupfs would be if libvirtd was run inside a > > container on a host that otherwise uses systemd. IIUC, the container > > should get given a delegated subtree and we need to be able to use > > cgroupfs backend here. Currently this works (accidentally) because > > we'll try to talk to machined and fail, so fallback to direct usage. > > That should be OK and we can support this even outside of containers if > we can verify that the cgroup subtree is delegated and it's safe to use > it for VMs. > > The wrong usage of delegation by libvirt is the source of the reported > BUG so we should be restrictive. Do you have a pointer to the bugs you've seen in this area. In particular I'm wondering if there's a need for different behaviour with v1 vs v2 cgroups. Systemd has always wanted proper delegation behaviour from apps, but IIUC, this was mostly a nice-to-have under v1, compared to a must-have under v2. My big concern is that we put in a restriction and accidentally break someone's deployment scenario under v1. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|