Re: [PATCH 3/4] mm: Centralize & improve oom reporting in show_mem.c

Michal Hocko <mhocko@xxxxxxxx> · Fri, 22 Apr 2022 11:27:05 +0200

On Fri 22-04-22 04:30:37, Kent Overstreet wrote:
> On Fri, Apr 22, 2022 at 10:03:36AM +0200, Michal Hocko wrote:
> > On Thu 21-04-22 14:42:13, Kent Overstreet wrote:
> > > On Thu, Apr 21, 2022 at 11:18:20AM +0200, Michal Hocko wrote:
> > [...]
> > > > > 00177 16644 pages reserved
> > > > > 00177 Unreclaimable slab info:
> > > > > 00177 9p-fcall-cache    total: 8.25 MiB active: 8.25 MiB
> > > > > 00177 kernfs_node_cache total: 2.15 MiB active: 2.15 MiB
> > > > > 00177 kmalloc-64        total: 2.08 MiB active: 2.07 MiB
> > > > > 00177 task_struct       total: 1.95 MiB active: 1.95 MiB
> > > > > 00177 kmalloc-4k        total: 1.50 MiB active: 1.50 MiB
> > > > > 00177 signal_cache      total: 1.34 MiB active: 1.34 MiB
> > > > > 00177 kmalloc-2k        total: 1.16 MiB active: 1.16 MiB
> > > > > 00177 bch_inode_info    total: 1.02 MiB active: 922 KiB
> > > > > 00177 perf_event        total: 1.02 MiB active: 1.02 MiB
> > > > > 00177 biovec-max        total: 992 KiB active: 960 KiB
> > > > > 00177 Shrinkers:
> > > > > 00177 super_cache_scan: objects: 127
> > > > > 00177 super_cache_scan: objects: 106
> > > > > 00177 jbd2_journal_shrink_scan: objects: 32
> > > > > 00177 ext4_es_scan: objects: 32
> > > > > 00177 bch2_btree_cache_scan: objects: 8
> > > > > 00177   nr nodes:          24
> > > > > 00177   nr dirty:          0
> > > > > 00177   cannibalize lock:  0000000000000000
> > > > > 00177 
> > > > > 00177 super_cache_scan: objects: 8
> > > > > 00177 super_cache_scan: objects: 1
> > > > 
> > > > How does this help to analyze this allocation failure?
> > > 
> > > You asked for an example of the output, which was an entirely reasonable
> > > request. Shrinkers weren't responsible for this OOM, so it doesn't help here -
> > 
> > OK, do you have an example where it clearly helps?
> 
> I've debugged quite a few issues with shrinkers over the years where this would
> have helped a lot (especially if it was also in sysfs), although nothing
> currently. I was just talking with Dave earlier tonight about more things that
> could be added for shrinkers, but I'm going to have to go over that conversation
> again and take notes.
> 
> Also, I feel I have to point out that OOM & memory reclaim debugging is an area
> where many filesystem developers feel that the MM people have been dropping the
> ball, and your initial response to this patch series...  well, it feels like
> more of the same.

Not sure where you get that feeling. Debugging memory reclaim is a PITA
because many problems can be indirect and tools we have available are
not really great. I do not remember MM people would be blocking useful
debugging tools addition.

> Still does to be honest, you're coming across like I haven't been working in
> this area for a decade+ and don't know what I'm touching. Really, I'm not new to
> this stuff.

I am sorry to hear that but there certainly is no intention like that
and TBH I do not even see where you get that feeling. You have posted a
changelog which doesn't explain really much. I am aware that you are far
from a kernel newbie and therefore I would really expect much more in
that regards.

> > > are you asking me to explain why shrinkers are relevant to OOMs and memory
> > > reclaim...?
> > 
> > No, not really, I guess that is quite clear. The thing is that the oom
> > report is quite bloated already and we should be rather picky on what to
> > dump there. Your above example is a good one here. You have an order-5
> > allocation failure and that can be caused by almost anything. Compaction
> > not making progress for many reasons - e.g. internal framentation caused
> > by pinned pages but also kmalloc allocations. The above output doesn't
> > help with any of that. Could shrinkers operation be related? Of course
> > it could but how can I tell?
> 
> Yeah sure and internal fragmentation would actually be an _excellent_ thing to
> add to the show_mem report.

Completely agreed. The only information we currently have is the
buddyinfo part which reports movability status but I do not think this
is remotely sufficient.

[...]

> > If we are lucky enough the oom is reproducible and additional
> > tracepoints (or whatever your prefer to use) tell us more. Far from
> > optimal, no question about that but I do not have a good answer on
> > where the trashhold should really be. Maybe we can come up with some
> > trigger based mechanism (e.g. some shrinkers are failing so they
> > register their debugging data which will get dumped on the OOM) which
> > would enable certain debugging information or something like that.
> 
> Why would we need a trigger mechanism?

Mostly because reasons for reclaim failures can vary a lot and the oom
report part doesn't have an idea what has happened during the
reclaim/compaction.

> Could you explain your objection to simply unconditionally dumping the top 10
> slabs and the top 10 shrinkers?

We already do that in some form. We dump unreclaimable slabs if they
consume more memory than user pages on LRUs. We also dump all slab
caches with some objects. Why is this approach not good? Should we tweak
the condition to dump or should we limit the dump? These are reasonable 
questions to ask. Your patch has dropped those without explaining any
of the motivation.

I am perfectly OK to modify should_dump_unreclaim_slab to dump even if
the slab memory consumption is lower. Also dumping small caches with
handful of objects can be excessive.

Wrt to shrinkers I really do not know what kind of shrinkers data would
be useful to dump and when. Therefore I am asking about examples.
-- 
Michal Hocko
SUSE Labs