On Fri, 1 Mar 2019 15:19:50 -0500 Qian Cai <cai@xxxxxx> wrote: > When onlining a memory block with DEBUG_PAGEALLOC, it unmaps the pages > in the block from kernel, However, it does not map those pages while > offlining at the beginning. As the result, it triggers a panic below > while onlining on ppc64le as it checks if the pages are mapped before > unmapping. However, the imbalance exists for all arches where > double-unmappings could happen. Therefore, let kernel map those pages in > generic_online_page() before they have being freed into the page > allocator for the first time where it will set the page count to one. > > On the other hand, it works fine during the boot, because at least for > IBM POWER8, it does, > > early_setup > early_init_mmu > harsh__early_init_mmu > htab_initialize [1] > htab_bolt_mapping [2] > > where it effectively map all memblock regions just like > kernel_map_linear_page(), so later mem_init() -> memblock_free_all() > will unmap them just fine without any imbalance. On other arches without > this imbalance checking, it still unmap them once at the most. > > [1] > for_each_memblock(memory, reg) { > base = (unsigned long)__va(reg->base); > size = reg->size; > > DBG("creating mapping for region: %lx..%lx (prot: %lx)\n", > base, size, prot); > > BUG_ON(htab_bolt_mapping(base, base + size, __pa(base), > prot, mmu_linear_psize, mmu_kernel_ssize)); > } > > [2] linear_map_hash_slots[paddr >> PAGE_SHIFT] = ret | 0x80; > > kernel BUG at arch/powerpc/mm/hash_utils_64.c:1815! > > ... > > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -660,6 +660,7 @@ static void generic_online_page(struct page *page) > { > __online_page_set_limits(page); > __online_page_increment_counters(page); > + kernel_map_pages(page, 1, 1); > __online_page_free(page); > } This code was changed a lot by Arun's "mm/page_alloc.c: memory hotplug: free pages as higher order". I don't think hotplug+DEBUG_PAGEALLOC is important enough to disrupt memory_hotplug-free-pages-as-higher-order.patch, which took a long time to sort out. So could you please take a look at linux-next, determine whether the problem is still there and propose a suitable patch? Thanks.