Hi, Sorry, I forgot to mention the V2 update. I will highlight the V2 changes and RESEND. > -----Original Message----- > From: Pintu Kumar [mailto:pintu.k@xxxxxxxxxxx] > Sent: Monday, October 12, 2015 7:03 PM > To: akpm@xxxxxxxxxxxxxxxxxxxx; minchan@xxxxxxxxxx; dave@xxxxxxxxxxxx; > pintu.k@xxxxxxxxxxx; mhocko@xxxxxxx; koct9i@xxxxxxxxx; > rientjes@xxxxxxxxxx; hannes@xxxxxxxxxxx; penguin-kernel@i- > love.sakura.ne.jp; bywxiaobai@xxxxxxx; mgorman@xxxxxxx; vbabka@xxxxxxx; > js1304@xxxxxxxxx; kirill.shutemov@xxxxxxxxxxxxxxx; > alexander.h.duyck@xxxxxxxxxx; sasha.levin@xxxxxxxxxx; cl@xxxxxxxxx; > fengguang.wu@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx > Cc: cpgs@xxxxxxxxxxx; pintu_agarwal@xxxxxxxxx; pintu.ping@xxxxxxxxx; > vishnu.ps@xxxxxxxxxxx; rohit.kr@xxxxxxxxxxx; c.rajkumar@xxxxxxxxxxx; > sreenathd@xxxxxxxxxxx > Subject: [PATCH 1/1] mm: vmstat: Add OOM victims count in vmstat counter > > This patch maintains the number of oom victims kill count in /proc/vmstat. > Currently, we are dependent upon kernel logs when the kernel OOM occurs. > But kernel OOM can went passed unnoticed by the developer as it can silently > kill some background applications/services. > In some small embedded system, it might be possible that OOM is captured in > the logs but it was over-written due to ring-buffer. > Thus this interface can quickly help the user in analyzing, whether there were > any OOM kill happened in the past, or whether the system have ever entered > the oom kill stage till date. > > Thus, it can be beneficial under following cases: > 1. User can monitor kernel oom kill scenario without looking into the > kernel logs. > 2. It can help in tuning the watermark level in the system. > 3. It can help in tuning the low memory killer behavior in user space. > 4. It can be helpful on a logless system or if klogd logging > (/var/log/messages) are disabled. > > A snapshot of the result of 3 days of over night test is shown below: > System: ARM Cortex A7, 1GB RAM, 8GB EMMC > Linux: 3.10.xx > Category: reference smart phone device > Loglevel: 7 > Conditions: Fully loaded, BT/WiFi/GPS ON > Tests: auto launching of ~30+ apps using test scripts, in a loop for > 3 days. > At the end of tests, check: > $ cat /proc/vmstat > nr_oom_victims 6 > > As we noticed, there were around 6 oom kill victims. > > The OOM is bad for any system. So, this counter can help in quickly tuning the > OOM behavior of the system, without depending on the logs. > > Signed-off-by: Pintu Kumar <pintu.k@xxxxxxxxxxx> > --- > include/linux/vm_event_item.h | 1 + > mm/oom_kill.c | 2 ++ > mm/page_alloc.c | 1 - > mm/vmstat.c | 1 + > 4 files changed, 4 insertions(+), 1 deletion(-) > > diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h > index 2b1cef8..dd2600d 100644 > --- a/include/linux/vm_event_item.h > +++ b/include/linux/vm_event_item.h > @@ -57,6 +57,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, > PSWPOUT, #ifdef CONFIG_HUGETLB_PAGE > HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL, #endif > + NR_OOM_VICTIMS, > UNEVICTABLE_PGCULLED, /* culled to noreclaim list */ > UNEVICTABLE_PGSCANNED, /* scanned for reclaimability */ > UNEVICTABLE_PGRESCUED, /* rescued from noreclaim list */ > diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 03b612b..802b8a1 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -570,6 +570,7 @@ void oom_kill_process(struct oom_control *oc, struct > task_struct *p, > * space under its control. > */ > do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true); > + count_vm_event(NR_OOM_VICTIMS); > mark_oom_victim(victim); > pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file- > rss:%lukB\n", > task_pid_nr(victim), victim->comm, K(victim->mm->total_vm), > @@ -600,6 +601,7 @@ void oom_kill_process(struct oom_control *oc, struct > task_struct *p, > task_pid_nr(p), p->comm); > task_unlock(p); > do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true); > + count_vm_event(NR_OOM_VICTIMS); > } > rcu_read_unlock(); > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9bcfd70..fafb09d 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2761,7 +2761,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned > int order, > schedule_timeout_uninterruptible(1); > return NULL; > } > - > /* > * Go through the zonelist yet one more time, keep very high watermark > * here, this is only to catch a parallel oom killing, we must fail if diff --git > a/mm/vmstat.c b/mm/vmstat.c index 1fd0886..8503a2e 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -808,6 +808,7 @@ const char * const vmstat_text[] = { > "htlb_buddy_alloc_success", > "htlb_buddy_alloc_fail", > #endif > + "nr_oom_victims", > "unevictable_pgs_culled", > "unevictable_pgs_scanned", > "unevictable_pgs_rescued", > -- > 1.7.9.5 Regards, Pintu -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>