Re: [PATCH] mm, memcg: show memcg min setting in oom messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 20, 2019 at 6:22 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> On Wed 20-11-19 03:53:05, Yafang Shao wrote:
> > A task running in a memcg may OOM because of the memory.min settings of his
> > slibing and parent. If this happens, the current oom messages can't show
> > why file page cache can't be reclaimed.
>
> min limit is not the only way to protect memory from being reclaim. The
> memory might be pinned or unreclaimable for other reasons (e.g. swap
> quota exceeded for memcg).

Both swap or unreclaimabed (unevicteable) is printed in OOM messages.
If something else can prevent the file cache being reclaimed, we'd
better show them as well.

> Besides that, there is the very same problem
> with the global OOM killer, right? And I do not expect we want to print
> all memcgs in the system (this might be hundreds).
>

I forgot the global oom...

Why not just print the memcgs which are under memory.min protection or
something like a total number of min protected memory ?

> > So it is better to show the memcg
> > min settings.
> > Let's take an example.
> >       bar    bar/memory.max = 1200M memory.min=800M
> >      /  \
> >    barA barB barA/memory.min = 800M memory.current=1G (file page cache)
> >              barB/memory.min = 0 (process in this memcg is allocating page)
> >
> > The process will do memcg reclaim if the bar/memory.max is reached. Once
> > the barA/memory.min is reached it will stop reclaiming file page caches in
> > barA, and if there is no reclaimable pages in bar and bar/barB it will
> > enter memcg OOM then.
> > After this pacch, bellow messages will be show then (only includeing the
> > relevant messages here). The lines begin with '#' are newly added info (the
> > '#' symbol is not in the original messages).
> >       memory: usage 1228800kB, limit 1228800kB, failcnt 18337
> >       ...
> >       # Memory cgroup min setting:
> >       # /bar: min 819200KB emin 0KB
> >       # /bar/barA: min 819200KB emin 819200KB
> >       # /bar/barB: min 0KB emin 0KB
> >       ...
> >       Memory cgroup stats for /bar:
> >       anon 418328576
> >       file 835756032
> >       ...
> >       unevictable 0
> >       ...
> >       oom-kill:constraint=CONSTRAINT_MEMCG..oom_memcg=/bar,task_memcg=/bar/barB
> >
> > With the new added information, we can find the memory.min in bar/barA is
> > reached and the processes in bar/barB can't reclaim file page cache from
> > bar/barA any more. While without this new added information we don't know
> > why the file page cache in bar can't be reclaimed.
>
> Well, I am not sure this is really usefull enough TBH. It doesn't give
> you the whole picture and it potentially generates a lot of output in
> the oom report. FYI we used to have a more precise break down of
> counters in memcg hierarchy, see 58cf188ed649 ("memcg, oom: provide more
> precise dump info while memcg oom happening") which later got rewritten
> by c8713d0b2312 ("mm: memcontrol: dump memory.stat during cgroup OOM")
>

At least we'd better print a total protected memory in the oom messages.

> Could you be more specific why do you really need this piece of
> information?

I have said in the commit log, that we don't know why the file cache
can't be reclaimed (when evictable is 0 and dirty is 0 as well.)

>
> > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
> > ---
> >  mm/memcontrol.c | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> --
> Michal Hocko
> SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux