Re: [PATCH V3] Add the pagefault count into memcg stats

KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> · Thu, 14 Apr 2011 08:52:39 +0900

On Wed, 13 Apr 2011 13:12:33 -0700 (PDT)
David Rientjes <rientjes@xxxxxxxxxx> wrote:

> On Tue, 29 Mar 2011, Ying Han wrote:
> 
> > Two new stats in per-memcg memory.stat which tracks the number of
> > page faults and number of major page faults.
> > 
> > "pgfault"
> > "pgmajfault"
> > 
> > They are different from "pgpgin"/"pgpgout" stat which count number of
> > pages charged/discharged to the cgroup and have no meaning of reading/
> > writing page to disk.
> > 
> > It is valuable to track the two stats for both measuring application's
> > performance as well as the efficiency of the kernel page reclaim path.
> > Counting pagefaults per process is useful, but we also need the aggregated
> > value since processes are monitored and controlled in cgroup basis in memcg.
> > 
> > Functional test: check the total number of pgfault/pgmajfault of all
> > memcgs and compare with global vmstat value:
> > 
> > $ cat /proc/vmstat | grep fault
> > pgfault 1070751
> > pgmajfault 553
> > 
> > $ cat /dev/cgroup/memory.stat | grep fault
> > pgfault 1071138
> > pgmajfault 553
> > total_pgfault 1071142
> > total_pgmajfault 553
> > 
> > $ cat /dev/cgroup/A/memory.stat | grep fault
> > pgfault 199
> > pgmajfault 0
> > total_pgfault 199
> > total_pgmajfault 0
> > 
> > Performance test: run page fault test(pft) wit 16 thread on faulting in 15G
> > anon pages in 16G container. There is no regression noticed on the "flt/cpu/s"
> > 
> > Sample output from pft:
> > TAG pft:anon-sys-default:
> >   Gb  Thr CLine   User     System     Wall    flt/cpu/s fault/wsec
> >   15   16   1     0.67s   233.41s    14.76s   16798.546 266356.260
> > 
> > +-------------------------------------------------------------------------+
> >     N           Min           Max        Median           Avg        Stddev
> > x  10     16682.962     17344.027     16913.524     16928.812      166.5362
> > +  10     16695.568     16923.896     16820.604     16824.652     84.816568
> > No difference proven at 95.0% confidence
> > 
> > Change v3..v2
> > 1. removed the unnecessary function definition in memcontrol.h
> > 
> > Signed-off-by: Ying Han <yinghan@xxxxxxxxxx>
> > Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> 
> I'm wondering if we can just modify count_vm_event() directly for 
> CONFIG_CGROUP_MEM_RES_CTLR so that we automatically track all vmstat items 
> (those in enum vm_event_item) for each memcg.  We could add an array of 
> NR_VM_EVENT_ITEMS into each struct mem_cgroup to be incremented on 
> count_vm_event() for current's memcg.
> 
> If that's done, we wouldn't have to add additional calls for every vmstat 
> item we want to duplicate from the global counters.
> 

Maybe we do that finally.

For now, IIUC, over 50% of VM_EVENTS are needless for memcg (ex. per zone stats)
and this array consumes large size of percpu area. I think we need to select
events carefully even if we do that. And current memcg's percpu stat is mixture
of vm_events and vm_stat. We may need to sort out them and re-design it.
My concern is that I'm not sure we have enough percpu area for vmstat+vmevents
for 1000+ memcg, and it's allowed even if we can do.

But yes, it seems worth to consider.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>