Re: [PATCH] mm, memcg: reclaim more aggressively before high allocator throttling

Michal Hocko <mhocko@xxxxxxxxxx> · Fri, 29 May 2020 09:31:18 +0200

On Thu 28-05-20 17:48:48, Chris Down wrote:
> Michal Hocko writes:
> > > We send a simple bug fix: bring this instance of reclaim in line with
> > > how everybody else is using the reclaim API, to meet the semantics as
> > > they are intendend and documented.
> > 
> > Here is where we are not on the same page though. Once you have identified
> > that the main problem is that the reclaim fails too early to meet the
> > target then the fix would be to enforce that target. I have asked why
> > this hasn't been done and haven't got any real answer for that. Instead
> > what you call "a simple bug fix" has larger consequences which are not
> > really explained in the changelog and they are also not really trivial
> > to see. If the changelog explicitly stated that the proportional memory
> > reclaim is not sufficient because XYZ and the implementation has been
> > changed to instead meet the high limit target then this would be a
> > completely different story and I believe we could have saved some
> > discussion.
> 
> I agree that the changelog can be made more clear. Any objection if I send
> v2 with changelog changes to that effect, then? :-)

Yes, please. And I would highly appreciate to have the above addressed.
So that we do not have to really scratch heads why a particular design
decision has been made and argue what was the thinking behind.

> > > And somehow this is controversial, and we're just changing around user
> > > promises as we see fit for our particular usecase?
> > > 
> > > I don't even understand how the supposed alternate semantics you read
> > > between the lines in the documentation would make for a useful
> > > feature: It may fail to contain a group of offending tasks to the
> > > configured limit, but it will be fair to those tasks while doing so?
> > > 
> > > > But if your really want to push this through then let's do it
> > > > properly at least. memcg->memcg_nr_pages_over_high has only very
> > > > vague meaning if the reclaim target is the high limit.
> > > 
> > > task->memcg_nr_pages_over_high is not vague, it's a best-effort
> > > mechanism to distribute fairness. It's the current task's share of the
> > > cgroup's overage, and it allows us in the majority of situations to
> > > distribute reclaim work and sleeps in proportion to how much the task
> > > is actually at fault.
> > 
> > Agreed. But this stops being the case as soon as the reclaim target has
> > been reached and new reclaim attempts are enforced because the memcg is
> > still above the high limit. Because then you have a completely different
> > reclaim target - get down to the limit. This would be especially visible
> > with a large memcg_nr_pages_over_high which could even lead to an over
> > reclaim.
> 
> We actually over reclaim even before this patch -- this patch doesn't bring
> much new in that regard.
> 
> Tracing try_to_free_pages for a cgroup at the memory.high threshold shows
> that before this change, we sometimes even reclaim on the order of twice the
> number of pages requested. For example, I see cases where we requested 1000
> pages to be reclaimed, but end up reclaiming 2000 in a single reclaim
> attempt.

This is interesting and worth looking into. I am aware that we can
reclaim potentially much more pages during the icache reclaim and that
there was a heated discussion without any fix merged in the end IIRC.
Do you have any details?

-- 
Michal Hocko
SUSE Labs