Hi, > -----Original Message----- > From: Michal Hocko [mailto:mhocko@xxxxxxxxxx] > Sent: Thursday, October 01, 2015 7:09 PM > To: Pintu Kumar > Cc: akpm@xxxxxxxxxxxxxxxxxxxx; minchan@xxxxxxxxxx; dave@xxxxxxxxxxxx; > koct9i@xxxxxxxxx; rientjes@xxxxxxxxxx; hannes@xxxxxxxxxxx; penguin- > kernel@xxxxxxxxxxxxxxxxxxx; bywxiaobai@xxxxxxx; mgorman@xxxxxxx; > vbabka@xxxxxxx; js1304@xxxxxxxxx; kirill.shutemov@xxxxxxxxxxxxxxx; > alexander.h.duyck@xxxxxxxxxx; sasha.levin@xxxxxxxxxx; cl@xxxxxxxxx; > fengguang.wu@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > cpgs@xxxxxxxxxxx; pintu_agarwal@xxxxxxxxx; pintu.ping@xxxxxxxxx; > vishnu.ps@xxxxxxxxxxx; rohit.kr@xxxxxxxxxxx; c.rajkumar@xxxxxxxxxxx; > sreenathd@xxxxxxxxxxx > Subject: Re: [PATCH 1/1] mm: vmstat: Add OOM kill count in vmstat counter > > On Thu 01-10-15 16:18:43, Pintu Kumar wrote: > > This patch maintains number of oom calls and number of oom kill count > > in /proc/vmstat. > > It is helpful during sluggish, aging or long duration tests. > > Currently if the OOM happens, it can be only seen in kernel ring buffer. > > But during long duration tests, all the dmesg and /var/log/messages* > > could be overwritten. > > So, just like other counters, the oom can also be maintained in > > /proc/vmstat. > > It can be also seen if all logs are disabled in kernel. > > > > A snapshot of the result of over night test is shown below: > > $ cat /proc/vmstat > > oom_stall 610 > > oom_kill_count 1763 > > > > Here, oom_stall indicates that there are 610 times, kernel entered > > into OOM cases. However, there were around 1763 oom killing happens. > > This alone looks quite suspicious. Unless you have tasks which share the address > space without being in the same thread group this shouldn't happen in such a > large scale. Yes, this accounts for out_of_memory even from memory cgroups. Please check few snapshots of dmesg outputs captured during over-night tests. ........ [49479.078033] [2: xxxxxxxx:20874] Memory cgroup out of memory: Kill process 20880 (xxxxxxx) score 112 or sacrifice child [49480.910430] [2: xxxxxxxx:20882] Memory cgroup out of memory: Kill process 20888 (xxxxxxxx) score 112 or sacrifice child [49567.046203] [0: yyyyyyy: 548] Out of memory: Kill process 20458 (zzzzzzzzzz) score 102 or sacrifice child [49567.346588] [0: yyyyyyy: 548] Out of memory: Kill process 21102 (zzzzzzzzzz) score 104 or sacrifice child ......... The _out of memory_ count in dmesg dump output exactly matches the number in /proc/vmstat -> oom_kill_count > </me looks into the patch> > And indeed the patch is incorrect. You are only counting OOMs from the page > allocator slow path. You are missing all the OOM invocations from the page fault > path. Sorry, I am not sure what exactly you mean. Please point me out if I am missing some places. Actually, I tried to add it at generic place that is; oom_kill_process, which is called by out_of_memory(...). Are you talking about: pagefault_out_of_memory(...) ? But, this is already calling: out_of_memory. No? > The placement inside __alloc_pages_may_oom looks quite arbitrary as well. You > are not counting events where we are OOM but somebody is holding the > oom_mutex but you do count last attempt before going really OOM. Then we > have cases which do not invoke OOM killer which are counted into oom_stall as > well. I am not sure whether they should because I am not quite sure about the > semantic of the counter in the first place. Ok. Yes, it can be added right after it enters into __alloc_pages_may_oom. I will make the changes. Actually, I knowingly skipped the oom_lock case, because in our 3.10 kernel, we had note_oom_kill(..) Added right after this check. So, I also added it exactly at the same place. Ok, I can make the necessary changes, if the oom_lock case also matters. > What is it supposed to tell us? How many times the system had to go into > emergency OOM steps? How many times the direct reclaim didn't make any > progress so we can consider the system OOM? > Yes, exactly, oom_stall can tell, how many times OOM is invoked in the system. Yes, it can also tell how many times direct_reclaim fails completely. Currently, we don't have any counter for direct_reclaim success/fail. Also, oom_kill_process will not be invoked for higher orders (PAGE_ALLOC_COSTLY_ORDER). But, it will enter OOM and results into straight page allocation failure. > oom_kill_count has a slightly misleading names because it suggests how many > times oom_kill was called but in fact it counts the oom victims. > Not sure whether this information is so much useful but the semantic is clear at > least. > Ok, agree about the semantic of the name: oom_kill_count. If possible please suggest a better name. How about the following names? oom_victim_count ? oom_nr_killed ? oom_nr_victim ? > > The OOM is bad for the any system. So, this counter can help the > > developer in tuning the memory requirement at least during initial bringup. > > > > Signed-off-by: Pintu Kumar <pintu.k@xxxxxxxxxxx> > > --- > > include/linux/vm_event_item.h | 2 ++ > > mm/oom_kill.c | 2 ++ > > mm/page_alloc.c | 2 +- > > mm/vmstat.c | 2 ++ > > 4 files changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/vm_event_item.h > > b/include/linux/vm_event_item.h index 2b1cef8..ade0851 100644 > > --- a/include/linux/vm_event_item.h > > +++ b/include/linux/vm_event_item.h > > @@ -57,6 +57,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, > > PSWPOUT, #ifdef CONFIG_HUGETLB_PAGE > > HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL, #endif > > + OOM_STALL, > > + OOM_KILL_COUNT, > > UNEVICTABLE_PGCULLED, /* culled to noreclaim list */ > > UNEVICTABLE_PGSCANNED, /* scanned for reclaimability */ > > UNEVICTABLE_PGRESCUED, /* rescued from noreclaim list */ > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 03b612b..e79caed > > 100644 > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -570,6 +570,7 @@ void oom_kill_process(struct oom_control *oc, struct > task_struct *p, > > * space under its control. > > */ > > do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true); > > + count_vm_event(OOM_KILL_COUNT); > > mark_oom_victim(victim); > > pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file- > rss:%lukB\n", > > task_pid_nr(victim), victim->comm, K(victim->mm->total_vm), > @@ > > -600,6 +601,7 @@ void oom_kill_process(struct oom_control *oc, struct > task_struct *p, > > task_pid_nr(p), p->comm); > > task_unlock(p); > > do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true); > > + count_vm_event(OOM_KILL_COUNT); > > } > > rcu_read_unlock(); > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9bcfd70..1d82210 > > 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2761,7 +2761,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned > int order, > > schedule_timeout_uninterruptible(1); > > return NULL; > > } > > - > > + count_vm_event(OOM_STALL); > > /* > > * Go through the zonelist yet one more time, keep very high watermark > > * here, this is only to catch a parallel oom killing, we must fail > > if diff --git a/mm/vmstat.c b/mm/vmstat.c index 1fd0886..f054265 > > 100644 > > --- a/mm/vmstat.c > > +++ b/mm/vmstat.c > > @@ -808,6 +808,8 @@ const char * const vmstat_text[] = { > > "htlb_buddy_alloc_success", > > "htlb_buddy_alloc_fail", > > #endif > > + "oom_stall", > > + "oom_kill_count", > > "unevictable_pgs_culled", > > "unevictable_pgs_scanned", > > "unevictable_pgs_rescued", > > -- > > 1.7.9.5 > > -- > Michal Hocko > SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>