On 29.02.2012 [15:28:30 -0800], Andrew Morton wrote: > On Wed, 29 Feb 2012 10:12:33 -0800 > Nishanth Aravamudan <nacc@xxxxxxxxxxxxxxxxxx> wrote: > > > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory > > Overcommit) on powerpc, we tripped the following: > > > > kernel BUG at mm/bootmem.c:483! > > > > ... > > > > This is > > > > BUG_ON(limit && goal + size > limit); > > > > and after some debugging, it seems that > > > > goal = 0x7ffff000000 > > limit = 0x80000000000 > > > > and sparse_early_usemaps_alloc_node -> > > sparse_early_usemaps_alloc_pgdat_section calls > > > > return alloc_bootmem_section(usemap_size() * count, section_nr); > > > > This is on a system with 8TB available via the AMS pool, and as a quirk > > of AMS in firmware, all of that memory shows up in node 0. So, we end up > > with an allocation that will fail the goal/limit constraints. In theory, > > we could "fall-back" to alloc_bootmem_node() in > > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE > > defined, we'll BUG_ON() instead. A simple solution appears to be to > > unconditionally remove the limit condition in alloc_bootmem_section, > > meaning allocations are allowed to cross section boundaries (necessary > > for systems of this size). > > > > Johannes Weiner pointed out that if alloc_bootmem_section() no longer > > guarantees section-locality, we need check_usemap_section_nr() to print > > possible cross-dependencies between node descriptors and the usemaps > > allocated through it. That makes the two loops in > > sparse_early_usemaps_alloc_node() identical, so re-factor the code a > > bit. > > The patch is a bit scary now, so I think we should merge it into > 3.4-rc1 and then backport it into 3.3.1 if nothing blows up. I think that's fair. > Do you think it should be backported into 3.3.x? Earlier kernels? 3.3.x seems reasonable. If I had to guess, I think this could be hit on any kernels with this functionality -- that is, sparsemem in general? Not sure how far back it's worth backporting. > Also, this? Urgh, yeah, that's way better. Acked-by: Nishanth Aravamudan <nacc@xxxxxxxxxx> > --- a/mm/bootmem.c~bootmem-sparsemem-remove-limit-constraint-in-alloc_bootmem_section-fix > +++ a/mm/bootmem.c > @@ -766,14 +766,13 @@ void * __init alloc_bootmem_section(unsi > unsigned long section_nr) > { > bootmem_data_t *bdata; > - unsigned long pfn, goal, limit; > + unsigned long pfn, goal; > > pfn = section_nr_to_pfn(section_nr); > goal = pfn << PAGE_SHIFT; > - limit = 0; > bdata = &bootmem_node_data[early_pfn_to_nid(pfn)]; > > - return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit); > + return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, 0); > } > #endif Thanks for all the feedback! -Nish -- Nishanth Aravamudan <nacc@xxxxxxxxxx> IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>