Re: Issue on reserving memory with no-map flag in DT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/20/2015 12:57 AM, Laura Abbott wrote:
> On 1/19/2015 7:49 AM, Vlastimil Babka wrote:
>> On 01/17/2015 01:24 AM, Laura Abbott wrote:
>>
>> I admit I may not see clearly through all the arch-specific layers and various
>> config option combinations that are possible here, so I might be misinterpreting
>> the code. But I think the problem here is not insufficient allocation size, but
>> something else.
>>
>> The code above continues by this line:
>>
>> 		pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
>>
>> So, size for the map allocation has already been calculated aligned to
>> MAX_ORDER_NR_PAGES before your patch, and node_mem_map points to the first
>> actually present page, which might be offset from the perfect alignment. Your
>> patch adds another offset to the already aligned size (but you use
>> pageblock_nr_pages which might be lower than MAX_ORDER_NR_PAGES; this seems like
>> a mistake in itself?). So with your patch we have map of aligned size starting
>> from the node_mem_map. This means the last offset-worth of struct pages should
>> be beyond what's needed to access struct page of pgdat_end_pfn(). If we need
>> that extra padding to prevent crashing, then it looks really suspicious...
>>
>> And when I look at node_mem_map usage, I see include/asm/generic/memory_model.h
>> defines __pfn_to_page as (basically)
>>
>> NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\
>>
>> and further above is a generic definition of arch_local_page_offset:
>>
>> #define arch_local_page_offset(pfn, nid)        \
>>          ((pfn) - NODE_DATA(nid)->node_start_pfn)
>>
>> So it looks correct to me without your patch. The map is allocated aligned,
>> node_mem_map points to this map at the offset corresponding to node_start_pfn,
>> and pfn_to_page subtracts node_start_pfn to get the offset relative to
>> node_mem_map. We shouldn't need the extra padding by the node_start_pfn offset,
>> unless something else is misbehaving here.
>>
>> In the issue fixed by 7c45512 that you refer to, the problem was basically that
>> the allocation didn't use aligned size, but this looks different to me?
>>
>>
> 
> With this hard coded debugging:
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..241b870 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5029,6 +5029,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>                          map = memblock_virt_alloc_node_nopanic(size,
>                                                                 pgdat->node_id);
>                  pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
> +               pr_err(">>> node_start_pfn %lx node_end_pfn %lx\n",
> +                       pgdat->node_start_pfn, pgdat_end_pfn(pgdat));
> +               pr_err(">>> size calculated %lx\n", size);
> +               pr_err(">>> allocated region %p-%lx\n", map, ((unsigned long)map)+size);
> +
>          }
>   #ifndef CONFIG_NEED_MULTIPLE_NODES
>          /*
> @@ -5043,6 +5048,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>          }
>   #endif
>   #endif /* CONFIG_FLAT_NODE_MEM_MAP */
> +       pr_err(">>> pfn %lx page %p\n", 0x200, pfn_to_page(0x200));
> +       pr_err(">>> pfn %lx page %p\n", 0xbffff, pfn_to_page(0xbffff));
>   }
>   
>   void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> 
> I get this output:
> [    0.000000] >>> node_start_pfn 200 node_end_pfn c0000
> [    0.000000] >>> size calculated 1800000
> [    0.000000] >>> allocated region edffa000-ef7fa000
> [    0.000000] >>> pfn 200 page ee002000
> [    0.000000] >>> pfn bffff page ef7fdfe0
> 
> The start and end pfn values are correct but that page value is outside of the
> allocated region for the memory map. This is a CONFIG_FLATMEM system so we
> aren't actually using arch_local_page_offset at all:
> 
> 
> #define __pfn_to_page(pfn)      (mem_map + ((pfn) - ARCH_PFN_OFFSET))
> #define __page_to_pfn(page)     ((unsigned long)((page) - mem_map) + \
>                                   ARCH_PFN_OFFSET)

Ah, OK. I searched just for node_mem_map and didn't notice it's also assigned to
mem_map.

> If you do the math, the array size is fine if we don't offset by the
> start but alloc_node_mem_map offsets assuming pfn_to_page will offset
> as well but this doesn't happen in CONFIG_FLATMEM.
> 
> Either alloc_node_mem_map needs to drop the offset or the pfn_to_page
> functions need to start adding the offset. It's worth noting that
> this gets corrected properly if we have CONFIG_HAVE_MEMBLOCK_NODE_MAP enabled
> so perhaps the fix is to unoffset for flatmem as well:
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..271c44b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5036,7 +5036,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>           */
>          if (pgdat == NODE_DATA(0)) {
>                  mem_map = NODE_DATA(0)->node_mem_map;
> -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> +#if defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) || defined(CONFIG_FLATMEM)
>                  if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
>                          mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET);

But is this correcting the same thing? The offset that's added earlier is
(pgdat->node_start_pfn - start) where "start" is just alignment of the
node_start_pfn to MAX_ORDER_NR_PAGES. But here we subtract whole
pgdat->node_start_pfn, minus a ARCH_PFN_OFFSET constant. Is the constant always
equeal to the earlier value of "start", which is calculated dynamically?.

So I agree that mem_map assignment should be fixed, but maybe not exactly like this?

>   #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> 
> Thanks,
> Laura
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]