Re: [RFC 0/7] Introduce memory allocation speed throttle in memcg

yulei zhang <yulei.kernel@xxxxxxxxx> · Thu, 3 Jun 2021 18:19:18 +0800

On Wed, Jun 2, 2021 at 11:39 PM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote:
>
> On Wed, Jun 2, 2021 at 2:11 AM yulei zhang <yulei.kernel@xxxxxxxxx> wrote:
> >
> > On Tue, Jun 1, 2021 at 10:45 PM Chris Down <chris@xxxxxxxxxxxxxx> wrote:
> > >
> > > yulei zhang writes:
> > > >Yep, dynamically adjust the memory.high limits can ease the memory pressure
> > > >and postpone the global reclaim, but it can easily trigger the oom in
> > > >the cgroups,
> > >
> > > To go further on Shakeel's point, which I agree with, memory.high should
> > > _never_ result in memcg OOM. Even if the limit is breached dramatically, we
> > > don't OOM the cgroup. If you have a demonstration of memory.high resulting in
> > > cgroup-level OOM kills in recent kernels, then that needs to be provided. :-)
> >
> > You are right, I mistook it for max. Shakeel means the throttling
> > during context switch
> > which uses memory.high as threshold to calculate the sleep time.
> > Currently it only applies
> > to cgroupv2.  In this patchset we explore another idea to throttle the
> > memory usage, which
> > rely on setting an average allocation speed in memcg. We hope to
> > suppress the memory
> > usage in low priority cgroups when it reaches the system watermark and
> > still keep the activities
> > alive.
>
> I think you need to make the case: why should we add one more form of
> throttling? Basically why memory.high is not good for your use-case
> and the proposed solution works better. Though IMO it would be a hard
> sell.

Thanks. IMHO, there are differences between these two throttlings.
memory.high is a per-memcg throttle which targets to limit the memory
usage of the tasks in the cgroup. For the memory allocation speed throttle(MST),
the purpose is to avoid the memory burst in cgroup which would trigger
the global reclaim and affects the timing sensitive workloads in other cgroup.
For example, we have two pods with memory overcommit enabled, one includes
online tasks and the other has offline tasks, if we restrict the memory usage of
the offline pod with memory.high, it will lose the benefit of memory overcommit
when the other workloads are idle. On the other hand, if we don't
limit the memory
usage, it will easily break the system watermark when there suddenly has massive
memory operations. If enable MST in this case, we will be able to
avoid the direct
reclaim and leverage the overcommit.
.