Re: [PATCH] mm, oom: make the calculation of oom badness more accurate

Michal Hocko <mhocko@xxxxxxxxxx> · Wed, 8 Jul 2020 21:02:25 +0200

On Wed 08-07-20 10:57:27, David Rientjes wrote:
> On Wed, 8 Jul 2020, Michal Hocko wrote:
> 
> > I have only now realized that David is not on Cc. Add him here. The
> > patch is http://lkml.kernel.org/r/1594214649-9837-1-git-send-email-laoar.shao@xxxxxxxxx.
> > 
> > I believe the main problem is that we are normalizing to oom_score_adj
> > units rather than usage/total. I have a very vague recollection this has
> > been done in the past but I didn't get to dig into details yet.
> > 
> 
> The memcg max is 4194304 pages, and an oom_score_adj of -998 would yield a 
> page adjustment of:
> 
> adj = -998 * 4194304 / 1000 = −4185915 pages
> 
> The largest pid 58406 (data_sim) has rss 3967322 pages,
> pgtables 37101568 / 4096 = 9058 pages, and swapents 0.  So it's unadjusted 
> badness is
> 
> 3967322 + 9058 pages = 3976380 pages
> 
> Factoring in oom_score_adj, all of these processes will have a badness of 
> 1 because oom_badness() doesn't underflow, which I think is the point of 
> Yafang's proposal.
> 
> I think the patch can work but, as you mention, also needs an update to 
> proc_oom_score().  proc_oom_score() is using the global amount of memory 
> so Yafang is likely not seeing it go negative for that reason but it could 
> happen.

Yes, memcg just makes it more obvious but the same might happen for the
global case. I am not sure how we can both alow underflow and present
the value that would fit the existing model. The exported value should
really reflect what the oom killer is using for the calculation or we
are going to see discrepancies between the real oom decision and
presented values. So I believe we really have to change the calculation
rather than just make it tolerant to underflows.

But I have to think about that much more.
-- 
Michal Hocko
SUSE Labs