RE: [PATCH 1/1] mm: vmstat: Add OOM victims count in vmstat counter

PINTU KUMAR <pintu.k@xxxxxxxxxxx> · Mon, 12 Oct 2015 20:14:21 +0530

Hi,

Sorry, I forgot to mention the V2 update.
I will highlight the V2 changes and RESEND.

> -----Original Message-----
> From: Pintu Kumar [mailto:pintu.k@xxxxxxxxxxx]
> Sent: Monday, October 12, 2015 7:03 PM
> To: akpm@xxxxxxxxxxxxxxxxxxxx; minchan@xxxxxxxxxx; dave@xxxxxxxxxxxx;
> pintu.k@xxxxxxxxxxx; mhocko@xxxxxxx; koct9i@xxxxxxxxx;
> rientjes@xxxxxxxxxx; hannes@xxxxxxxxxxx; penguin-kernel@i-
> love.sakura.ne.jp; bywxiaobai@xxxxxxx; mgorman@xxxxxxx; vbabka@xxxxxxx;
> js1304@xxxxxxxxx; kirill.shutemov@xxxxxxxxxxxxxxx;
> alexander.h.duyck@xxxxxxxxxx; sasha.levin@xxxxxxxxxx; cl@xxxxxxxxx;
> fengguang.wu@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx
> Cc: cpgs@xxxxxxxxxxx; pintu_agarwal@xxxxxxxxx; pintu.ping@xxxxxxxxx;
> vishnu.ps@xxxxxxxxxxx; rohit.kr@xxxxxxxxxxx; c.rajkumar@xxxxxxxxxxx;
> sreenathd@xxxxxxxxxxx
> Subject: [PATCH 1/1] mm: vmstat: Add OOM victims count in vmstat counter
> 
> This patch maintains the number of oom victims kill count in /proc/vmstat.
> Currently, we are dependent upon kernel logs when the kernel OOM occurs.
> But kernel OOM can went passed unnoticed by the developer as it can silently
> kill some background applications/services.
> In some small embedded system, it might be possible that OOM is captured in
> the logs but it was over-written due to ring-buffer.
> Thus this interface can quickly help the user in analyzing, whether there were
> any OOM kill happened in the past, or whether the system have ever entered
> the oom kill stage till date.
> 
> Thus, it can be beneficial under following cases:
> 1. User can monitor kernel oom kill scenario without looking into the
>    kernel logs.
> 2. It can help in tuning the watermark level in the system.
> 3. It can help in tuning the low memory killer behavior in user space.
> 4. It can be helpful on a logless system or if klogd logging
>    (/var/log/messages) are disabled.
> 
> A snapshot of the result of 3 days of over night test is shown below:
> System: ARM Cortex A7, 1GB RAM, 8GB EMMC
> Linux: 3.10.xx
> Category: reference smart phone device
> Loglevel: 7
> Conditions: Fully loaded, BT/WiFi/GPS ON
> Tests: auto launching of ~30+ apps using test scripts, in a loop for
> 3 days.
> At the end of tests, check:
> $ cat /proc/vmstat
> nr_oom_victims 6
> 
> As we noticed, there were around 6 oom kill victims.
> 
> The OOM is bad for any system. So, this counter can help in quickly tuning the
> OOM behavior of the system, without depending on the logs.
> 
> Signed-off-by: Pintu Kumar <pintu.k@xxxxxxxxxxx>
> ---
>  include/linux/vm_event_item.h |    1 +
>  mm/oom_kill.c                 |    2 ++
>  mm/page_alloc.c               |    1 -
>  mm/vmstat.c                   |    1 +
>  4 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 2b1cef8..dd2600d 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -57,6 +57,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN,
> PSWPOUT,  #ifdef CONFIG_HUGETLB_PAGE
>  		HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,  #endif
> +		NR_OOM_VICTIMS,
>  		UNEVICTABLE_PGCULLED,	/* culled to noreclaim list */
>  		UNEVICTABLE_PGSCANNED,	/* scanned for reclaimability */
>  		UNEVICTABLE_PGRESCUED,	/* rescued from noreclaim list */
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 03b612b..802b8a1 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -570,6 +570,7 @@ void oom_kill_process(struct oom_control *oc, struct
> task_struct *p,
>  	 * space under its control.
>  	 */
>  	do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
> +	count_vm_event(NR_OOM_VICTIMS);
>  	mark_oom_victim(victim);
>  	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-
> rss:%lukB\n",
>  		task_pid_nr(victim), victim->comm, K(victim->mm->total_vm),
> @@ -600,6 +601,7 @@ void oom_kill_process(struct oom_control *oc, struct
> task_struct *p,
>  				task_pid_nr(p), p->comm);
>  			task_unlock(p);
>  			do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true);
> +			count_vm_event(NR_OOM_VICTIMS);
>  		}
>  	rcu_read_unlock();
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9bcfd70..fafb09d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2761,7 +2761,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned
> int order,
>  		schedule_timeout_uninterruptible(1);
>  		return NULL;
>  	}
> -
>  	/*
>  	 * Go through the zonelist yet one more time, keep very high watermark
>  	 * here, this is only to catch a parallel oom killing, we must fail if
diff --git
> a/mm/vmstat.c b/mm/vmstat.c index 1fd0886..8503a2e 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -808,6 +808,7 @@ const char * const vmstat_text[] = {
>  	"htlb_buddy_alloc_success",
>  	"htlb_buddy_alloc_fail",
>  #endif
> +	"nr_oom_victims",
>  	"unevictable_pgs_culled",
>  	"unevictable_pgs_scanned",
>  	"unevictable_pgs_rescued",
> --
> 1.7.9.5

Regards,
Pintu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>