Re: [PATCH] mm/hotplug: invalid PFNs from pfn_to_online_page()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon 21-01-19 12:58:46, Qian Cai wrote:
> 
> 
> On 1/21/19 11:38 AM, Qian Cai wrote:
> > 
> > 
> > On 1/21/19 4:53 AM, Michal Hocko wrote:
> >> On Thu 17-01-19 21:16:50, Qian Cai wrote:
> >>> On an arm64 ThunderX2 server, the first kmemleak scan would crash [1]
> >>> with CONFIG_DEBUG_VM_PGFLAGS=y due to page_to_nid() found a pfn that is
> >>> not directly mapped (MEMBLOCK_NOMAP). Hence, the page->flags is
> >>> uninitialized.
> >>>
> >>> This is due to the commit 9f1eb38e0e11 ("mm, kmemleak: little
> >>> optimization while scanning") starts to use pfn_to_online_page() instead
> >>> of pfn_valid(). However, in the CONFIG_MEMORY_HOTPLUG=y case,
> >>> pfn_to_online_page() does not call memblock_is_map_memory() while
> >>> pfn_valid() does.
> >>
> >> How come there is an online section which has an pfn_valid==F? We do
> >> allocate the full section worth of struct pages so there is a valid
> >> struct page. Is there any hole inside this section?
> > 
> > It has CONFIG_HOLES_IN_ZONE=y.
> 
> Actually, this does not seem have anything to do with holes.
> 
> 68709f45385a arm64: only consider memblocks with NOMAP cleared for linear mapping
> 
> This causes pages marked as nomap being no long reassigned to the new zone in
> memmap_init_zone() by calling __init_single_page().

Thanks for the pointer. This sched some light but I cannot say I would
understand all the details.

> There is an old discussion for this topic.
> https://lkml.org/lkml/2016/11/30/566

Hmm, I see. The documentation is not the best (mea culpa)
 * Return page for the valid pfn only if the page is online. All pfn
 * walkers which rely on the fully initialized page->flags and others
 * should use this rather than pfn_valid && pfn_to_page

This suggests that the pfn is _valid_ when using pfn_to_online_page and
some callers indeed do so. Some of them don't though which is probably
because the later part of the documentation suggests that it should
replace pfn_valid & pfn_to_page. Thinking about this more, I guess we do
not want to put an additional burden on callers and require pfn_valid to
be called as well. This is just error prone and can lead to problems
like this one.

So I agree with your change (modulo the range check) but please make
sure to make all this information to the changelog.

Thanks!
-- 
Michal Hocko
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux