RE: [PATCH 1/1] mm: vmstat: Add OOM kill count in vmstat counter

PINTU KUMAR <pintu.k@xxxxxxxxxxx> · Fri, 09 Oct 2015 18:29:49 +0530

> -----Original Message-----
> From: Michal Hocko [mailto:mhocko@xxxxxxxxxx]
> Sent: Thursday, October 08, 2015 10:01 PM
> To: PINTU KUMAR
> Cc: akpm@xxxxxxxxxxxxxxxxxxxx; minchan@xxxxxxxxxx; dave@xxxxxxxxxxxx;
> koct9i@xxxxxxxxx; rientjes@xxxxxxxxxx; hannes@xxxxxxxxxxx; penguin-
> kernel@xxxxxxxxxxxxxxxxxxx; bywxiaobai@xxxxxxx; mgorman@xxxxxxx;
> vbabka@xxxxxxx; js1304@xxxxxxxxx; kirill.shutemov@xxxxxxxxxxxxxxx;
> alexander.h.duyck@xxxxxxxxxx; sasha.levin@xxxxxxxxxx; cl@xxxxxxxxx;
> fengguang.wu@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx;
> cpgs@xxxxxxxxxxx; pintu_agarwal@xxxxxxxxx; pintu.ping@xxxxxxxxx;
> vishnu.ps@xxxxxxxxxxx; rohit.kr@xxxxxxxxxxx; c.rajkumar@xxxxxxxxxxx;
> sreenathd@xxxxxxxxxxx
> Subject: Re: [PATCH 1/1] mm: vmstat: Add OOM kill count in vmstat counter
> 
> On Thu 08-10-15 21:36:24, PINTU KUMAR wrote:
> [...]
> > Whereas, these OOM logs were not found in /var/log/messages.
> > May be we do heavy logging because in ageing test we enable maximum
> > functionality (Wifi, BT, GPS, fully loaded system).
> 
> If you swamp your logs so heavily that even critical messages won't make it
into
> the log files then your logging is basically useless for anything serious. But
that is
> not really that important.
> 
> > Hope, it is clear now. If not, please ask me for more information.
> >
> > >
> > > > Now, every time this dumping is not feasible. And instead of
> > > > counting manually in log file, we wanted to know number of oom
> > > > kills happened during
> > > this tests.
> > > > So we decided to add a counter in /proc/vmstat to track the kernel
> > > > oom_kill, and monitor it during our ageing test.
> > > >
> > > > Basically, we wanted to tune our user space LMK killer for
> > > > different threshold values, so that we can completely avoid the kernel
oom
> kill.
> > > > So, just by looking into this counter, we could able to tune the
> > > > LMK threshold values without depending on the kernel log messages.
> > >
> > > Wouldn't a trace point suit you better for this particular use case
> > > considering this is a testing environment?
> > >
> > Tracing for oom_kill count?
> > Actually, tracing related configs will be normally disabled in release
binary.
> 
> Yes but your use case described a testing environment.
> 
> > And it is not always feasible to perform tracing for such long duration
tests.
> 
> I do not see why long duration would be a problem. Each tracepoint can be
> enabled separatelly.
> 
> > Then it should be valid for other counters as well.
> >
> > > > Also, in most of the system /var/log/messages are not present and
> > > > we just depends on kernel dmesg output, which is petty small for longer
> run.
> > > > Even if we reduce the loglevel to 4, it may not be suitable to
> > > > capture all
> > logs.
> > >
> > > Hmm, I would consider a logless system considerably crippled but I
> > > see your point and I can imagine that especially small devices might
> > > try to save every single B of the storage. Such a system is
> > > basically undebugable IMO but it
> > still
> > > might be interesting to see OOM killer traces.
> > >
> > Exactly, some of the small embedded systems might be having 512MB,
> > 256MB, 128MB, or even lesser.
> > Also, the storage space will be 8GB or below.
> > In such a system we cannot afford heavy log files and exact tuning and
> > stability is most important.
> 
> And that is what log level is for. If your logs are heavy with error levels
then you
> are far from being production ready... ;)
> 
> > Even all tracing / profiling configs will be disabled to lowest level
> > for reducing kernel code size as well.
> 
> What level is that? crit? Is err really that noisy?
> 
No. I was talking about kernel configs. Normally we keep some profiling/tracing
related configs disabled for low memory system, to save some kernel code size.
The point is that it's always not easy for all systems to heavily depends on
logging and tracing.
Else, the other counters would also not be required.
We thought that the /proc/vmstat output (which is ideally available in all
systems, small or big, embedded or none embedded), it can quickly tell us what
has happened really.

> [...]
> > > > Ok, you are suggesting to divide the oom_kill counter into 2 parts
> > > > (global &
> > > > memcg) ?
> > > > May be something like:
> > > > nr_oom_victims
> > > > nr_memcg_oom_victims
> > >
> > > You do not need the later. Memcg interface already provides you with
> > > a notification API and if a counter is _really_ needed then it
> > > should be per-memcg not a global cumulative number.
> >
> > Ok, for memory cgroups, you mean to say this one?
> > sh-3.2# cat /sys/fs/cgroup/memory/memory.oom_control
> > oom_kill_disable 0
> > under_oom 0
> 
> Yes this is the notification API.
> 
> > I am actually confused here what to do next?
> > Shall I push a new patch set with just:
> > nr_oom_victims counter ?
> 
> Yes you can repost with a better description about a typical usage scenarios.
I
> cannot say I would be completely sold to this because the only relevant
usecase
> I've heard so far is the logless system which is pretty much a corner case.
This is
> not a reason to nack it though. It is definitely better than the original
oom_stall
> suggestion because it has a clear semantic at least.

Ok, thank you very much for your suggestions.
I agree, oom_stall is not so important.
I will try to submit a new patch set with only _nr_oom_victims_ with the
descriptions about the usefulness that I came across.
If anybody else can point out other use cases, please let me know. 
I will be happy to try that and share the results.

> --
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>