Re: [PATCH 0/3] OOM detection rework v4

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Wed, 16 Dec 2015 15:35:13 -0800

On Tue, 15 Dec 2015 19:19:43 +0100 Michal Hocko <mhocko@xxxxxxxxxx> wrote:

> This is an attempt to make the OOM detection more deterministic and
> easier to follow because each reclaimer basically tracks its own
> progress which is implemented at the page allocator layer rather spread
> out between the allocator and the reclaim. The more on the implementation
> is described in the first patch.

We've been futzing with this stuff for many years and it still isn't
working well.  This makes me expect that the new implementation will
take a long time to settle in.

To aid and accelerate this process I suggest we lard this code up with
lots of debug info, so when someone reports an issue we have the best
possible chance of understanding what went wrong.

This is easy in the case of oom-too-early - it's all slowpath code and
we can just do printk(everything).  It's not so easy in the case of
oom-too-late-or-never.  The reporter's machine just hangs or it
twiddles thumbs for five minutes then goes oom.  But there are things
we can do here as well, such as:

- add an automatic "nearly oom" detection which detects when things
  start going wrong and turns on diagnostics (this would need an enable
  knob, possibly in debugfs).

- forget about an autodetector and simply add a debugfs knob to turn on
  the diagnostics.

- sprinkle tracepoints everywhere and provide a set of
  instructions/scripts so that people who know nothing about kernel
  internals or tracing can easily gather the info we need to understand
  issues.

- add a sysrq key to turn on diagnostics.  Pretty essential when the
  machine is comatose and doesn't respond to keystrokes.

- something else

So...  please have a think about it?  What can we add in here to make it
as easy as possible for us (ie: you ;)) to get this code working well? 
At this time, too much developer support code will be better than too
little.  We can take it out later on.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>