Re: [PATCH] mm: memcontrol: asynchronous reclaim for memory.high

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Wed, 19 Feb 2020 11:31:39 -0800

On Wed, 19 Feb 2020 19:37:31 +0100 Michal Hocko <mhocko@xxxxxxxxxx> wrote:

> On Wed 19-02-20 13:12:19, Johannes Weiner wrote:
> > We have received regression reports from users whose workloads moved
> > into containers and subsequently encountered new latencies. For some
> > users these were a nuisance, but for some it meant missing their SLA
> > response times. We tracked those delays down to cgroup limits, which
> > inject direct reclaim stalls into the workload where previously all
> > reclaim was handled my kswapd.
> 
> I am curious why is this unexpected when the high limit is explicitly
> documented as a throttling mechanism.

Yes, this sounds like a feature-not-a-bug.

But what was the nature of these stalls?  If they were "stuck in D
state waiting for something" then that's throttling.  If they were
"unexpected bursts of in-kernel CPU activity" then I see a better case.