Re: [PATCH v2 8/8] mm: Centralize & improve oom reporting in show_mem.c

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Fri, 22 Apr 2022 19:48:20 -0400

On Fri, Apr 22, 2022 at 08:09:48AM -0700, Roman Gushchin wrote:
> To add a concern: largest shrinkers are usually memcg-aware. Scanning
> over the whole cgroup tree (with potentially hundreds or thousands of cgroups)
> and over all shrinkers from the oom context sounds like a bad idea to me.

Why would we be scanning over the whole cgroup tree? We don't do that in the
vmscan code, nor the new report. If the OOM is for a specific cgroup, we should
probably only be reporting on memory usage for that cgroup (show_mem() is not
currently cgroup aware, but perhaps it should be).

> IMO it's more appropriate to do from userspace by oomd or a similar daemon,
> well before the in-kernel OOM kicks in.

The reason I've been introducing printbufs and the .to_text() method was
specifically to make this code general enough to be available from
sysfs/debugfs - so I see no reasons why a userspace oomd couldn't make use of it
as well.

> > Last but not least let me echo the concern from the other reply. Memory
> > allocations are not really reasonable to be done from the oom context so
> > the pr_buf doesn't sound like a good tool here.
> 
> +1

In my experience, it's rare to be _so_ out of memory that small kmalloc
allocations are failing - we'll be triggering the show_mem() report before that
happens.

However, if this turns out not to be the case in practice, or if there's a
consensus now that we really want to guard against this, I have some thoughts.
We could either:

 - mempool-ify printbufs as a whole

 - reserve some memory just for the show_mem() report, which would mean either
   adding support to printbuf for external buffers (subsuming what seq_buf
   does), or shrinker .to_text() methods would have to output to seq_buf instead
   of printbuf (ew, API fragmentation).