On Tue, Jun 07, 2011 at 09:57:08AM +0900, KAMEZAWA Hiroyuki wrote: > On Mon, 6 Jun 2011 14:45:19 -0700 > Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > Hopefully he can test this one for us as well, thanks. > > > > A patch with better description (of mine) is here. > Anyway, I felt I needed a fix for ARM special case. > > == > fix-init-page_cgroup-for-sparsemem-taking-care-of-broken-page-flags.patch > Even with SPARSEMEM, there are some magical memmap. > Who wants to introduce SPARSEMEM_MAGICAL? > If a Node is not aligned to SECTION, memmap of pfn which is out of > Node's range is not initialized. And page->flags contains 0. > This is tangential but it might be worth introducing CONFIG_DEBUG_MEMORY_MODEL that WARN_ONs page->flag == 0 in pfn_to_page() to catch some accesses outside node boundaries. Not for this bug though. > If Node(0) doesn't exist, NODE_DATA(pfn_to_nid(pfn)) causes error. > Well, not in itself. It causes a bug when we try allocate memory from node 0 but there is a subtle performance bug here as well. For unaligned nodes, the cgroup information can be allocated from node 0 instead of node-local. > In another case, for example, ARM frees memmap which is never be used > even under SPARSEMEM. In that case, page->flags will contain broken > value. > Again, not as such. In that case, struct page is not valid memory at all. > This patch does a strict check on nid which is obtained by > pfn_to_page() and use proper NID for page_cgroup allocation. > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> > > --- > mm/page_cgroup.c | 36 +++++++++++++++++++++++++++++++++++- > 1 file changed, 35 insertions(+), 1 deletion(-) > > Index: linux-3.0-rc1/mm/page_cgroup.c > =================================================================== > --- linux-3.0-rc1.orig/mm/page_cgroup.c > +++ linux-3.0-rc1/mm/page_cgroup.c > @@ -168,6 +168,7 @@ static int __meminit init_section_page_c > struct mem_section *section; > unsigned long table_size; > unsigned long nr; > + unsigned long tmp; > int nid, index; > > nr = pfn_to_section_nr(pfn); > @@ -175,8 +176,41 @@ static int __meminit init_section_page_c > > if (section->page_cgroup) > return 0; > + /* > + * check Node-ID. Because we get 'pfn' which is obtained by calculation, > + * the pfn may "not exist" or "alreay freed". Even if pfn_valid() returns > + * true, page->flags may contain broken value and pfn_to_nid() returns > + * bad value. > + * (See CONFIG_ARCH_HAS_HOLES_MEMORYMODEL and ARM's free_memmap()) > + * So, we need to do careful check, here. > + */ You don't really need to worry about ARM here as long as you stay within node boundaries and you only care about the first valid page in the node. Why not lookup NODE_DATA(nid) and make sure start and end are within the node boundaries? > + for (tmp = pfn; > + tmp < pfn + PAGES_PER_SECTION; > + tmp += MAX_ORDER_NR_PAGES, nid = -1) { > + struct page *page; > + > + if (!pfn_valid(tmp)) > + continue; > + > + page = pfn_to_page(tmp); > + nid = page_to_nid(page); > > - nid = page_to_nid(pfn_to_page(pfn)); > + /* > + * If 'page' isn't initialized or freed, it may contains broken > + * information. > + */ > + if (!node_state(nid, N_NORMAL_MEMORY)) > + continue; > + if (page_to_pfn(pfn_to_page(tmp)) != tmp) > + continue; > + /* > + * The page seems valid and this 'nid' is safe to access, > + * at least. > + */ > + break; > + } > + if (nid == -1) > + return 0; > table_size = sizeof(struct page_cgroup) * PAGES_PER_SECTION; > base = alloc_page_cgroup(table_size, nid); > -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>