Hi, Liu Jiang At 11/14/2012 10:52 PM, Jiang Liu Wrote: > On 11/07/2012 04:43 AM, Andrew Morton wrote: >> On Tue, 6 Nov 2012 09:31:57 +0800 >> Jiang Liu <jiang.liu@xxxxxxxxxx> wrote: >> >>> Changeset 7f1290f2f2 tries to fix a issue when calculating >>> zone->present_pages, but it causes a regression to 32bit systems with >>> HIGHMEM. With that changeset, function reset_zone_present_pages() >>> resets all zone->present_pages to zero, and fixup_zone_present_pages() >>> is called to recalculate zone->present_pages when boot allocator frees >>> core memory pages into buddy allocator. Because highmem pages are not >>> freed by bootmem allocator, all highmem zones' present_pages becomes >>> zero. >>> >>> Actually there's no need to recalculate present_pages for highmem zone >>> because bootmem allocator never allocates pages from them. So fix the >>> regression by skipping highmem in function reset_zone_present_pages() >>> and fixup_zone_present_pages(). >>> >>> ... >>> >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void) >>> for_each_node_state(nid, N_HIGH_MEMORY) { >>> for (i = 0; i < MAX_NR_ZONES; i++) { >>> z = NODE_DATA(nid)->node_zones + i; >>> - z->present_pages = 0; >>> + if (!is_highmem(z)) >>> + z->present_pages = 0; >>> } >>> } >>> } >>> @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn, >>> >>> for (i = 0; i < MAX_NR_ZONES; i++) { >>> z = NODE_DATA(nid)->node_zones + i; >>> + if (is_highmem(z)) >>> + continue; >>> + >>> zone_start_pfn = z->zone_start_pfn; >>> zone_end_pfn = zone_start_pfn + z->spanned_pages; >>> - >>> - /* if the two regions intersect */ >>> if (!(zone_start_pfn >= end_pfn || zone_end_pfn <= start_pfn)) >>> z->present_pages += min(end_pfn, zone_end_pfn) - >>> max(start_pfn, zone_start_pfn); >> >> This ... isn't very nice. It is embeds within >> reset_zone_present_pages() and fixup_zone_present_pages() knowledge >> about their caller's state. Or, more specifically, it is emebedding >> knowledge about the overall state of the system when these functions >> are called. >> >> I mean, a function called "reset_zone_present_pages" should reset >> ->present_pages! >> >> The fact that fixup_zone_present_page() has multiple call sites makes >> this all even more risky. And what are the interactions between this >> and memory hotplug? >> >> Can we find a cleaner fix? >> >> Please tell us more about what's happening here. Is it the case that >> reset_zone_present_pages() is being called *after* highmem has been >> populated? If so, then fixup_zone_present_pages() should work >> correctly for highmem? Or is it the case that highmem hasn't yet been >> setup? IOW, what is the sequence of operations here? >> >> Is the problem that we're *missing* a call to >> fixup_zone_present_pages(), perhaps? If we call >> fixup_zone_present_pages() after highmem has been populated, >> fixup_zone_present_pages() should correctly fill in the highmem zone's >> ->present_pages? > Hi Andrew, > Sorry for the late response:( > I have done more investigations according to your suggestions. Currently > we have only called fixup_zone_present_pages() for memory freed by bootmem > allocator and missed HIGHMEM pages. We could also call fixup_zone_present_pages() > for HIGHMEM pages, but that will need to change arch specific code for x86, powerpc, > sparc, microblaze, arm, mips, um and tile etc. Seems a little overhead. > And sadly enough, I found the quick fix is still incomplete. The original > patch still have another issue that, reset_zone_present_pages() is only called > for IA64, so it will cause trouble for other arches which make use of "bootmem.c". > Then I feel a little guilty and tried to find a cleaner solution without > touching arch specific code. But things are more complex than my expectation and > I'm still working on that. > So how about totally reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 > and I will post another version once I found a cleaner way? I think fixup_zone_present_pages() are very useful for memory hotplug. We calculate zone->present_pages in free_area_init_core(), but its value is wrong. So it is why we fix it in fixup_zone_present_pages(). What about this: 1. init zone->present_pages to the present pages in this zone(include bootmem) 2. don't reset zone->present_pages for HIGHMEM pages We don't allocate bootmem from HIGHMEM. So its present pages is inited in step1 and there is no need to fix it in step2. Is it OK? If it is OK, I will resend the patch for step1(the patch is from laijs). Thanks Wen Congyang > Thanks! > Gerry > >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>