Re: Memory reclaim protection and cgroup nesting (desktop use)

Benjamin Berg <benjamin@xxxxxxxxxxxxxxxx> · Thu, 05 Mar 2020 16:27:19 +0100

On Thu, 2020-03-05 at 09:55 -0500, Tejun Heo wrote:
> Hello,
> 
> On Thu, Mar 05, 2020 at 02:13:58PM +0100, Benjamin Berg wrote:
> > A major discussion point seemed to be that cgroups should be grouped by
> > their resource management needs rather than a logical hierarchy. I
> > think that the resource management needs actually map well enough to
> > the logical hierarchy in our case. The hierarchy looks like:
> 
> Yeah, the two layouts share a lot of commonalities in most cases. It's
> not like we usually wanna distribute resources completely unrelated to
> how the system is composed logically.
> 
> >                          root
> >                        /     \
> >            system.slice       user.slice
> >           /    |              |         \
> >       cron  journal    user-1000.slice   user-1001.slice
> >                               |                      \
> >                       user@1000.service            [SAME]
> >                         |          |
> >                    apps.slice   session.slice
> >                        |             |
> >                   unprotected    protected
> > 
> ...
> > I think this actually makes sense. Both from an hierarchical point of
> > view and also for configuring resources. In particular the user-.slice
> > layer is important, because this grouping allows us to dynamically
> > adjust resource management. The obvious thing we can do there is to
> > prioritise the currently active user while also lowering resource
> > allocations for inactive users (e.g. graphical greeter still running in
> > the background).
> 
> Changing memory limits dynamically can lead to pretty abrupt system
> behaviors depending on how big the swing is but memory.low and io/cpu
> weights should behave fine.

Right, we'll need some daemon to handle this, so we could even smooth
out any change over a period of time. But it seems like that will not
be needed. I don't expect we'll want to change anything beyond
memory.low and io/cpu weights.

I opened
  https://github.com/systemd/systemd/issues/15028
to discuss this further. I'll update the ticket with more pointers and
information later.

> > Note, that from my point of view the scenario that most concerns me is
> > a resource competition between session.slice and its siblings. This
> > makes the hierarchy above even less important; we just need to give the
> > user enough control to do resource allocations within their own
> > subtree.
> > 
> > So, it seems to me that the suggested mount option should work well in
> > our scenario.
> 
> Sounds great. In our experience, what would help quite a lot is using
> per-application cgroups more (e.g. containing each application as user
> services) so that one misbehaving command can't overwhelm the session
> and eventually when oomd has to kick in, it can identify and kill only
> the culprit application rather than the whole session.

We are already trying to do this in GNOME. :)

Right now GNOME is only moving processes into cgroups after launching
them though (i.e. transient systemd scopes).

But, the goal here is to improve it further and launch all
applications directly using systemd (i.e. as systemd services). systemd
itself is going to define some standards to facilitate everything. And
we'll probably also need to update some XDG standards.

So, there are some plans already, but many details have not been solved
yet. But at least KDE and GNOME people are looking into integrating
well with systemd.

Benjamin
Attachment:
signature.asc

Description: This is a digitally signed message part