On Fri 09-11-12 18:23:07, Sha Zhengju wrote: > On 11/09/2012 12:25 AM, Michal Hocko wrote: > >On Thu 08-11-12 23:52:47, Sha Zhengju wrote: [...] > >>+ for (i = 0; i< MEM_CGROUP_STAT_NSTATS; i++) { > >>+ long long val = 0; > >>+ if (i == MEM_CGROUP_STAT_SWAP&& !do_swap_account) > >>+ continue; > >>+ for_each_mem_cgroup_tree(mi, memcg) > >>+ val += mem_cgroup_read_stat(mi, i); > >>+ printk(KERN_CONT "%s:%lldKB ", mem_cgroup_stat_names[i], K(val)); > >>+ } > >>+ > >>+ for (i = 0; i< NR_LRU_LISTS; i++) { > >>+ unsigned long long val = 0; > >>+ > >>+ for_each_mem_cgroup_tree(mi, memcg) > >>+ val += mem_cgroup_nr_lru_pages(mi, BIT(i)); > >>+ printk(KERN_CONT "%s:%lluKB ", mem_cgroup_lru_names[i], K(val)); > >>+ } > >>+ printk(KERN_CONT "\n"); > >This is nice and simple I am just thinking whether it is enough. Say > >that you have a deeper hierarchy and the there is a safety limit in the > >its root > > A (limit) > > /|\ > > B C D > > |\ > > E F > > > >and we trigger an OOM on the A's limit. Now we know that something blew > >up but what it was we do not know. Wouldn't it be better to swap the for > >and for_each_mem_cgroup_tree loops? Then we would see the whole > >hierarchy and can potentially point at the group which doesn't behave. > >Memory cgroup stats for A/: ... > >Memory cgroup stats for A/B/: ... > >Memory cgroup stats for A/C/: ... > >Memory cgroup stats for A/D/: ... > >Memory cgroup stats for A/D/E/: ... > >Memory cgroup stats for A/D/F/: ... > > > >Would it still fit in with your use case? > >[...] > > We haven't used those complicate hierarchy yet, but it sounds a good > suggestion. :) > Hierarchy is a little complex to use from our experience, and the > three cgroups involved in memcg oom can be different: memcg of > invoker, killed task, memcg of going over limit.Suppose a process in > B triggers oom and a victim in root A is selected to be killed, we > may as well want to know memcg stats just local in A cgroup(excludes > BCD). So besides hierarchy info, does it acceptable to also print > the local root node stats which as I did in the V1 > version(https://lkml.org/lkml/2012/7/30/179). Ohh, I probably wasn't clear enough. I didn't suggest cumulative numbers. Only per group. So it would be something like: for_each_mem_cgroup_tree(mi, memcg) { printk("Memory cgroup stats for %s", memcg_name); for (i = 0; i< MEM_CGROUP_STAT_NSTATS; i++) { if (i == MEM_CGROUP_STAT_SWAP&& !do_swap_account) continue; printk(KERN_CONT "%s:%lldKB ", mem_cgroup_stat_names[i], K(mem_cgroup_read_stat(mi, i))); } for (i = 0; i< NR_LRU_LISTS; i++) printk(KERN_CONT "%s:%lluKB ", mem_cgroup_lru_names[i], K(mem_cgroup_nr_lru_pages(mi, BIT(i)))); printk(KERN_CONT"\n"); } > Another one I'm hesitating is numa stats, it seems the output is > beginning to get more and more.... NUMA stats are basically per node - per zone LRU data and that the for(NR_LRU_LISTS) can be easily extended to cover that. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>