On Tue, Jun 07, 2011 at 06:06:30PM +0900, KAMEZAWA Hiroyuki wrote: > On Tue, 7 Jun 2011 10:03:13 +0100 > Mel Gorman <mgorman@xxxxxxx> wrote: > > > On Tue, Jun 07, 2011 at 09:57:08AM +0900, KAMEZAWA Hiroyuki wrote: > > > On Mon, 6 Jun 2011 14:45:19 -0700 > > > Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > Hopefully he can test this one for us as well, thanks. > > > > > > > > > > A patch with better description (of mine) is here. > > > Anyway, I felt I needed a fix for ARM special case. > > > > > > == > > > fix-init-page_cgroup-for-sparsemem-taking-care-of-broken-page-flags.patch > > > Even with SPARSEMEM, there are some magical memmap. > > > > > > > Who wants to introduce SPARSEMEM_MAGICAL? > > > > ARM guys ;) > > > > If a Node is not aligned to SECTION, memmap of pfn which is out of > > > Node's range is not initialized. And page->flags contains 0. > > > > > > > This is tangential but it might be worth introducing > > CONFIG_DEBUG_MEMORY_MODEL that WARN_ONs page->flag == 0 in > > pfn_to_page() to catch some accesses outside node boundaries. Not for > > this bug though. > > > > Hmm, buf if zone == 0 && section == 0 && nid == 0, page->flags is 0. > Sorry, what I meant to suggest was that page->flags outside of boundaries be initialised to a poison value that is an impossible combination of flags and check that. > > > If Node(0) doesn't exist, NODE_DATA(pfn_to_nid(pfn)) causes error. > > > > > > > Well, not in itself. It causes a bug when we try allocate memory > > from node 0 but there is a subtle performance bug here as well. For > > unaligned nodes, the cgroup information can be allocated from node > > 0 instead of node-local. > > > > > In another case, for example, ARM frees memmap which is never be used > > > even under SPARSEMEM. In that case, page->flags will contain broken > > > value. > > > > > > > Again, not as such. In that case, struct page is not valid memory > > at all. > > Hmm, IIUC, ARM's code frees memmap by free_bootmem().....so, memory used > for 'struct page' is valid and can access (but it's not struct page.) > > If my English sounds strange, I'm sorry. Hm > > How about this ? > == > In another case, for example, ARM frees memmap which is never be used > and reuse memory for memmap for other purpose. So, in that case, > a page got by pfn_to_page(pfn) may not a struct page. > == > Much better. > > > > > > > This patch does a strict check on nid which is obtained by > > > pfn_to_page() and use proper NID for page_cgroup allocation. > > > > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> > > > > > > --- > > > mm/page_cgroup.c | 36 +++++++++++++++++++++++++++++++++++- > > > 1 file changed, 35 insertions(+), 1 deletion(-) > > > > > > Index: linux-3.0-rc1/mm/page_cgroup.c > > > =================================================================== > > > --- linux-3.0-rc1.orig/mm/page_cgroup.c > > > +++ linux-3.0-rc1/mm/page_cgroup.c > > > @@ -168,6 +168,7 @@ static int __meminit init_section_page_c > > > struct mem_section *section; > > > unsigned long table_size; > > > unsigned long nr; > > > + unsigned long tmp; > > > int nid, index; > > > > > > nr = pfn_to_section_nr(pfn); > > > @@ -175,8 +176,41 @@ static int __meminit init_section_page_c > > > > > > if (section->page_cgroup) > > > return 0; > > > + /* > > > + * check Node-ID. Because we get 'pfn' which is obtained by calculation, > > > + * the pfn may "not exist" or "alreay freed". Even if pfn_valid() returns > > > + * true, page->flags may contain broken value and pfn_to_nid() returns > > > + * bad value. > > > + * (See CONFIG_ARCH_HAS_HOLES_MEMORYMODEL and ARM's free_memmap()) > > > + * So, we need to do careful check, here. > > > + */ > > > > You don't really need to worry about ARM here as long as you stay > > within node boundaries and you only care about the first valid page > > in the node. Why not lookup NODE_DATA(nid) and make sure start and > > end are within the node boundaries? > > > > I thought ARM's code just takes care of MAX_ORDER alignment.. Which is not the same as section alignment and whatever alignment it's using, the start of the node is still going to be valid. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>