On 06.01.21 11:42, Michal Hocko wrote: > On Wed 06-01-21 10:56:19, David Hildenbrand wrote: > [...] >> Note that this is not sufficient in the general case. I already >> mentioned that we effectively override an already initialized memmap. >> >> --- >> >> [ SECTION ] >> Before: >> [ ZONE_NORMAL ][ Hole ] >> >> The hole has some node/zone (currently 0/0, discussions ongoing on how >> to optimize that to e.g., ZONE_NORMAL in this example) and is >> PG_reserved - looks like an ordinary memory hole. >> >> After memremap: >> [ ZONE_NORMAL ][ ZONE_DEVICE ] >> >> The already initialized memmap was converted to ZONE_DEVICE. Your >> slowpath will work. >> >> After memunmap (no poisioning): >> [ ZONE_NORMAL ][ ZONE_DEVICE ] >> >> The slow path is no longer working. pfn_to_online_page() might return >> something that is ZONE_DEVICE. >> >> After memunmap (poisioning): >> [ ZONE_NORMAL ][ POISONED ] >> >> The slow path is no longer working. pfn_to_online_page() might return >> something that will BUG_ON via page_to_nid() etc. >> >> --- >> >> Reason is that pfn_to_online_page() does no care about sub-sections. And >> for now, it didn't had to. If there was an online section, it either was >> >> a) Completely present. The whole memmap is initialized to sane values. >> b) Partially present. The whole memmap is initialized to sane values. >> >> memremap/memunmap messes with case b) > > I do not see we ever clear the newly added flag and my understanding is > that the subsection removed would lead to get_dev_pagemap returning a > NULL. Which would obviously need to be checked for pfn_to_online_page. > Or do I miss anything and the above is not the case and we could still > get false positives? See my example above ("After memunmap"). We're still in the slow pathg. pfn_to_online_page() will return a struct page as get_dev_pagemap() is now NULL. Yet page_zone(page) will either - BUG_ON (memmap was poisoned) - return ZONE_DEVICE zone (memmap not poisoned when memunmapping) As I said, can be tackled by checking for pfn_section_valid() at least on the slow path. Ideally also on the fast path. > >> Well have to further tweak pfn_to_online_page(). You'll have to also >> check pfn_section_valid() *at least* on the slow path. Less-hacky would >> be checking it also in the "somehwat-faster" path - that would cover >> silently overriding a memmap that's visible via pfn_to_online_page(). >> Might slow down things a bit. >> >> >> Not completely opposed to this, but I would certainly still prefer just >> avoiding this corner case completely instead of patching around it. Thanks! > > Well, I would love to have no surprises either. So far there was not > actual argument why the pmem reserved space cannot be fully initialized. Yes, I'm still hoping Dan can clarify that. > On the other hand making sure that pfn_to_online_page sounds like the > right thing to do. And having an explicit check for zone device there in > a slow path makes sense to me. As I said, I'd favor to simplify and just get rid of the special case, instead of coming up with increasingly complex ways to deal with it. pfn_to_online_page() used to be simple, essentially checking a single flag was sufficient in most setups. -- Thanks, David / dhildenb