On Wed, 19 Feb 2020 19:37:31 +0100 Michal Hocko <mhocko@xxxxxxxxxx> wrote: > On Wed 19-02-20 13:12:19, Johannes Weiner wrote: > > We have received regression reports from users whose workloads moved > > into containers and subsequently encountered new latencies. For some > > users these were a nuisance, but for some it meant missing their SLA > > response times. We tracked those delays down to cgroup limits, which > > inject direct reclaim stalls into the workload where previously all > > reclaim was handled my kswapd. > > I am curious why is this unexpected when the high limit is explicitly > documented as a throttling mechanism. Yes, this sounds like a feature-not-a-bug. But what was the nature of these stalls? If they were "stuck in D state waiting for something" then that's throttling. If they were "unexpected bursts of in-kernel CPU activity" then I see a better case.