On 02/24/2015 08:54 PM, Laura Abbott wrote: > Reviving this thread because I don't think it ever got resolved. > > On 2/3/2015 6:25 PM, Laura Abbott wrote: >> On 1/29/2015 5:13 AM, Vlastimil Babka wrote: >>>> >>>> I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me? I'm just >>>> back today after been offline a week so didn't review the patch but IIRC, >>>> ARCH_PFN_OFFSET deals with the case where physical memory does not start >>>> at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0. >>>> I don't recall it being related to the alignment of node 0 so if there >>>> are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET >>>> related then I'm surprised. >>> >>> You're right that ARCH_PFN_OFFSET wasn't added by you, but by commit >>> 467bc461d2 which was a bugfix to your commit c713216dee, which did >>> introduce the mem_map correction code, and after which the code looked like: >>> >>> mem_map = NODE_DATA(0)->node_mem_map; >>> #ifdef CONFIG_ARCH_POPULATES_NODE_MAP >>> if (page_to_pfn(mem_map) != pgdat->node_start_pfn) >>> mem_map -= pgdat->node_start_pfn; >>> #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ >>> >>> >>> It's from 2006 so I can't expect you remember the details, but I had some >>> trouble finding out what this does. I assume it makes sure that mem_map points >>> to struct page corresponding to pfn 0, because that's what translations using >>> mem_map expect. >>> But pgdat->node_mem_map points to struct page corresponding to >>> pgdat->node_start_pfn, which might not be 0. So it subtracts node_start_pfn >>> to fix that. This is OK, as the node_mem_map is allocated (in this very >>> function) with padding so that it covers a MAX_ORDER_NR_PAGES aligned area >>> where node_mem_map may point to the middle of it. >>> >>> Commit 467bc461d2 fixed this in case the first pfn is not 0, but ARCH_PFN_OFFSET. >>> So mem_map points to struct page corresponding to pfn=ARCH_PFN_OFFSET, which >>> is OK. But I still have few doubts: >>> >>> 1) The "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" sort of silently >>> assumes that mem_map is allocated at the beginning of the node, i.e. at >>> pgdat->node_start_pfn. And the only reason for this if-condition to be true, >>> is that we haven't corrected the page_to_pfn translation, which uses mem_map. >>> Is this assumption always OK to do? Shouldn't the if-condition be instead about >>> pgdat->node_start_pfn not being aligned? >>> >>> 2) The #ifdef guard is about CONFIG_ARCH_POPULATES_NODE_MAP, which is nowadays called > CONFIG_HAVE_MEMBLOCK_NODE_MAP. But shouldn't it be #ifdef FLATMEM instead? >>> After all, we are correcting value of mem_map based on page_to_pfn code >>> variant used on FLATMEM. arm doesn't define >>> CONFIG_ARCH_POPULATES_NODE_MAP but apparently needs this correction. >>> >> >> Just doing #ifdef FLATMEM doesn't work because ARCH_PFN_OFFSET doesn't >> seem to be picked up properly for NOMMU arches properly. Probably just >> missing a header somewhere. >> >>> 3) The node_mem_map allocation code aligns the allocation to MAX_ORDER_NR_PAGES, >>> so the offset between the start of the allocated map and where node_mem_map >>> points to will be up to MAX_ORDER_NR_PAGES. >>> However, here we subtract (in current kernel) (pgdat->node_start_pfn - ARCH_PFN_OFFSET). >>> That looks like another silent assumption, that pgdat->node_start_pfn is always >>> between ARCH_PFN_OFFSET and ARCH_PFN_OFFSET + MAX_ORDER_NR_PAGES. If it were >>> larger, the mem_map correction would subtract too much and end up below what >>> was allocated for node_mem_map, no? The bug report behind this patch said that >>> first 2MB of memory was reserved using "no-map flag using DT". Unless this somehow >>> translates to ARCH_PFN_OFFSET at build time, we would underflow mem_map, right? >>> Maybe I'm just overly paranoid here and of course ARCH_PFN_OFFSET is determined >>> properly on arm... >>> >>> If anyone can confirm my doubts or point me to what I'm missing, thanks. >> >> ARCH_PFN_OFFSET should always be the lowest PFN in the system, otherwise >> I think plenty of other things are broken given how many architectures >> make this assumption. That said, I don't think subtracting ARCH_PFN_OFFSET >> makes it obvious why the adjustment is being made. >> >> Thanks, >> Laura >> > > I was incorrect before: it isn't just NOMMU but architectures that don't use > asm-generic/memory_model.h which failed to compile. I could respin with Hm I see, some architectures use own variant of page_to_pfn, that's why it's being used in the if () check. > more ifdefery around the ARCH_PFN_OFFSET if that sounds reasonable. So I think your v2 might be correct already. Unless there's an architecture that defines CONFIG_FLATMEM and not CONFIG_HAVE_MEMBLOCK_NODE_MAP and places memmap somewhere else than pgdat->node_start_pfn, which would trigger the check for a wrong reason after the patch. Looks like arm is an arch that doesn't define CONFIG_HAVE_MEMBLOCK_NODE_MAP, yet it defines ARCH_PFN_OFFSET. With your patch it would correct memmap by the calculated offset, not the ARCH_PFN_OFFSET constant. Are these two the same then? Should there be something like a VM_BUG_ON that ARCH_PFN_OFFSET (if it exists) is indeed equal to the calculated offset? Or maybe a more general VM_BUG_ON checking that after any correction we make, the (page_to_pfn(mem_map) == pgdat->node_start_pfn) condition holds? > Thanks, > Laura > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>