Re: [PATCH v2 3/3] mm: memcontrol: recursive memory.low protection

Michal Hocko <mhocko@xxxxxxxxxx> · Thu, 13 Feb 2020 16:46:27 +0100

On Thu 13-02-20 08:23:17, Johannes Weiner wrote:
> On Thu, Feb 13, 2020 at 08:40:49AM +0100, Michal Hocko wrote:
> > On Wed 12-02-20 12:08:26, Johannes Weiner wrote:
> > > On Tue, Feb 11, 2020 at 05:47:53PM +0100, Michal Hocko wrote:
> > > > Unless I am missing something then I am afraid it doesn't. Say you have a
> > > > default systemd cgroup deployment (aka deeper cgroup hierarchy with
> > > > slices and scopes) and now you want to grant a reclaim protection on a
> > > > leaf cgroup (or even a whole slice that is not really important). All the
> > > > hierarchy up the tree has the protection set to 0 by default, right? You
> > > > simply cannot get that protection. You would need to configure the
> > > > protection up the hierarchy and that is really cumbersome.
> > > 
> > > Okay, I think I know what you mean. Let's say you have a tree like
> > > this:
> > > 
> > >                           A
> > >                          / \
> > >                         B1  B2
> > >                        / \   \
> > >                       C1 C2   C3
> > > 
> > > and there is no actual delegation point - everything belongs to the
> > > same user / trust domain. C1 sets memory.low to 10G, but its parents
> > > set nothing. You're saying we should honor the 10G protection during
> > > global and limit reclaims anywhere in the tree?
> > 
> > No, only in the C1 which sets the limit, because that is the woriking
> > set we want to protect.
> > 
> > > Now let's consider there is a delegation point at B1: we set up and
> > > trust B1, but not its children. What effect would the C1 protection
> > > have then? Would we ignore it during global and A reclaim, but honor
> > > it when there is B1 limit reclaim?
> > 
> > In the scheme with the inherited protection it would act as the gate
> > and require an explicit low limit setup defaulting to 0 if none is
> > specified.
> > 
> > > Doing an explicit downward propagation from the root to C1 *could* be
> > > tedious, but I can't think of a scenario where it's completely
> > > impossible. Especially because we allow proportional distribution when
> > > the limit is overcommitted and you don't have to be 100% accurate.
> > 
> > So let's see how that works in practice, say a multi workload setup
> > with a complex/deep cgroup hierachies (e.g. your above example). No
> > delegation point this time.
> > 
> > C1 asks for low=1G while using 500M, C3 low=100M using 80M.  B1 and
> > B2 are completely independent workloads and the same applies to C2 which
> > doesn't ask for any protection at all? C2 uses 100M. Now the admin has
> > to propagate protection upwards so B1 low=1G, B2 low=100M and A low=1G,
> > right? Let's say we have a global reclaim due to external pressure that
> > originates from outside of A hierarchy (it is not overcommited on the
> > protection).
> > 
> > Unless I miss something C2 would get a protection even though nobody
> > asked for it.
> 
> Good observation, but I think you spotted an unintentional side effect
> of how I implemented the "floating protection" calculation rather than
> a design problem.
> 
> My patch still allows explicit downward propagation. So if B1 sets up
> 1G, and C1 explicitly claims those 1G (low>=1G, usage>=1G), C2 does
> NOT get any protection. There is no "floating" protection left in B1
> that could get to C2.

Yeah, the saturated protection works reasonably AFAICS.

> However, to calculate the float, I'm using the utilized protection
> counters (children_low_usage) to determine what is "claimed". Mostly
> for convenience because they were already there. In your example, C1
> is only utilizing 500M of its protection, leaving 500M in the float
> that will go toward C2. I agree that's undesirable.
> 
> But it's fixable by adding a hierarchical children_low counter that
> tracks the static configuration, and using that to calculate floating
> protection instead of the dynamic children_low_usage.
> 
> That way you can propagate protection from A to C1 without it spilling
> to anybody else unintentionally, regardless of how much B1 and C1 are
> actually *using*.
> 
> Does that sound reasonable?

Please post a patch and I will think about it more to see whether I can
see more problems. I am worried this is getting more and more complex
and harder to wrap head around.

Thanks!
-- 
Michal Hocko
SUSE Labs