RE: [RESEND PATCH 1/1] mm: vmstat: Add OOM victims count in vmstat counter

PINTU KUMAR <pintu.k@xxxxxxxxxxx> · Thu, 15 Oct 2015 20:05:17 +0530

Hi,

> -----Original Message-----
> From: David Rientjes [mailto:rientjes@xxxxxxxxxx]
> Sent: Thursday, October 15, 2015 3:35 AM
> To: PINTU KUMAR
> Cc: akpm@xxxxxxxxxxxxxxxxxxxx; minchan@xxxxxxxxxx; dave@xxxxxxxxxxxx;
> mhocko@xxxxxxx; koct9i@xxxxxxxxx; hannes@xxxxxxxxxxx; penguin-kernel@i-
> love.sakura.ne.jp; bywxiaobai@xxxxxxx; mgorman@xxxxxxx; vbabka@xxxxxxx;
> js1304@xxxxxxxxx; kirill.shutemov@xxxxxxxxxxxxxxx;
> alexander.h.duyck@xxxxxxxxxx; sasha.levin@xxxxxxxxxx; cl@xxxxxxxxx;
> fengguang.wu@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx;
> cpgs@xxxxxxxxxxx; pintu_agarwal@xxxxxxxxx; pintu.ping@xxxxxxxxx;
> vishnu.ps@xxxxxxxxxxx; rohit.kr@xxxxxxxxxxx; c.rajkumar@xxxxxxxxxxx
> Subject: RE: [RESEND PATCH 1/1] mm: vmstat: Add OOM victims count in vmstat
> counter
> 
> On Wed, 14 Oct 2015, PINTU KUMAR wrote:
> 
> > For me it was very helpful during sluggish and long duration ageing tests.
> > With this, I don't have to look into the logs manually.
> > I just monitor this count in a script.
> > The moment I get nr_oom_victims > 1, I know that kernel OOM would have
> > happened and I need to take the log dump.
> > So, then I do: dmesg >> oom_logs.txt
> > Or, even stop the tests for further tuning.
> >
> 
> I think eventfd(2) was created for that purpose, to avoid the constant polling
> that you would have to do to check nr_oom_victims and then take a snapshot.
> 
> > > I disagree with this one, because we can encounter oom kills due to
> > > fragmentation rather than low memory conditions for high-order
allocations.
> > > The amount of free memory may be substantially higher than all zone
> > > watermarks.
> > >
> > AFAIK, kernel oom happens only for lower-order
> (PAGE_ALLOC_COSTLY_ORDER).
> > For higher-order we get page allocation failure.
> >
> 
> Order-3 is included.  I've seen machines with _gigabytes_ of free memory in
> ZONE_NORMAL on a node and have an order-3 page allocation failure that
> called the oom killer.
> 
Yes, if PAGE_ALLOC_COSTLY_ORDER is defined as 3, then order-3 will be included
for OOM. But that's fine. We are just interested to know if system entered oom
state.
That's the reason, earlier I added even _oom_stall_ to know if system ever
entered oom but resulted into page allocation failure instead of oom killing.

> > > We've long had a desire to have a better oom reporting mechanism
> > > rather than just the kernel log.  It seems like you're feeling the
> > > same pain.  I think it
> > would be
> > > better to have an eventfd notifier for system oom conditions so we
> > > can track kernel oom kills (and conditions) in userspace.  I have a
> > > patch for that, and
> > it
> > > works quite well when userspace is mlocked with a buffer in memory.
> > >
> > Ok, this would be interesting.
> > Can you point me to the patches?
> > I will quickly check if it is useful for us.
> >
> 
> https://lwn.net/Articles/589404.  It's invasive and isn't upstream.  I would
like to
> restructure that patchset to avoid the memcg trickery and allow for a
root-only
> eventfd(2) notification through procfs on system oom.

I am interested only in global oom case and not memcg. We have memcg enabled but
I think even memcg_oom will finally invoke _oom_kill_process_.
So, I am interested in a patchset that can trigger notifications from
oom_kill_process, as soon as any victim is killed.
Sorry, from your patchset, I could not actually local the system_oom
notification patch.
If you have similar patchset please point me to it.
It will be really helpful.
Thank you!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>