On 03/06/2013 05:07 AM, Kamezawa Hiroyuki wrote: > (2013/03/05 22:10), Glauber Costa wrote: >> If we are not using memcg, there is no reason why we should allocate >> this structure, that will be a memory waste at best. We can do better >> at least in the sparsemem case, and allocate it when the first cgroup >> is requested. It should now not panic on failure, and we have to handle >> this right. >> >> flatmem case is a bit more complicated, so that one is left out for >> the moment. >> >> Signed-off-by: Glauber Costa <glommer@xxxxxxxxxxxxx> >> CC: Michal Hocko <mhocko@xxxxxxx> >> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> >> CC: Johannes Weiner <hannes@xxxxxxxxxxx> >> CC: Mel Gorman <mgorman@xxxxxxx> >> CC: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >> --- >> include/linux/page_cgroup.h | 28 +++++---- >> init/main.c | 2 - >> mm/memcontrol.c | 3 +- >> mm/page_cgroup.c | 150 ++++++++++++++++++++++++-------------------- >> 4 files changed, 99 insertions(+), 84 deletions(-) > > This patch seems a complicated mixture of clean-up and what-you-really-want. > I swear it is all what-I-really-want, any cleanups are non-intentional! >> -#if !defined(CONFIG_SPARSEMEM) >> +static void *alloc_page_cgroup(size_t size, int nid) >> +{ >> + gfp_t flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN; >> + void *addr = NULL; >> + >> + addr = alloc_pages_exact_nid(nid, size, flags); >> + if (addr) { >> + kmemleak_alloc(addr, size, 1, flags); >> + return addr; >> + } > > As far as I remember, this function was written for SPARSEMEM. > > How big this "size" will be with FLATMEM/DISCONTIGMEM ? > if 16GB, 16 * 1024 * 1024 * 1024 / 4096 * 16 = 64MB. > > What happens if order > MAX_ORDER is passed to alloc_pages()...no warning ? > > How about using vmalloc always if not SPARSEMEM ? I don't oppose. >> >> -void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat) >> +static void free_page_cgroup(void *addr) >> +{ >> + if (is_vmalloc_addr(addr)) { >> + vfree(addr); >> + } else { >> + struct page *page = virt_to_page(addr); >> + int nid = page_to_nid(page); >> + BUG_ON(PageReserved(page)); > > This BUG_ON() can be removed. > You are right, although it is still a bug =) >> + free_pages_exact(addr, page_cgroup_table_size(nid)); >> + } >> +} >> + >> +static void page_cgroup_msg(void) >> +{ >> + printk(KERN_INFO "allocated %ld bytes of page_cgroup\n", total_usage); >> + printk(KERN_INFO "please try 'cgroup_disable=memory' option if you " >> + "don't want memory cgroups.\nAlternatively, consider " >> + "deferring your memory cgroups creation.\n"); >> +} > > I think this warning can be removed because it's not boot option problem > after this patch. I guess the boot option can be obsolete.... > I think it is extremely useful, at least during the next couple of releases. A lot of distributions will create memcgs for no apparent reasons way before they are used (if used at all), as a placeholder only. This can at least tell them that there is a way to stop paying a memory penalty (together with the actual memory footprint) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>