On Tue, 2019-09-17 at 11:49 -0400, Waiman Long wrote: > On 9/17/19 3:13 AM, David Hildenbrand wrote: > > On 17.09.19 04:34, Toshiki Fukasawa wrote: > > > On 2019/09/09 16:46, David Hildenbrand wrote: > > > > Let's take a step back here to understand the issues I am aware of. I > > > > think we should solve this for good now: > > > > > > > > A PFN walker takes a look at a random PFN at a random point in time. It > > > > finds a PFN with SECTION_MARKED_PRESENT && !SECTION_IS_ONLINE. The > > > > options are: > > > > > > > > 1. It is buddy memory (add_memory()) that has not been online yet. The > > > > memmap contains garbage. Don't access. > > > > > > > > 2. It is ZONE_DEVICE memory with a valid memmap. Access it. > > > > > > > > 3. It is ZONE_DEVICE memory with an invalid memmap, because the section > > > > is only partially present: E.g., device starts at offset 64MB within a > > > > section or the device ends at offset 64MB within a section. Don't access it. > > > > > > I don't agree with case #3. In the case, struct page area is not allocated on > > > ZONE_DEVICE, but is allocated on system memory. So I think we can access the > > > struct pages. What do you mean "invalid memmap"? > > > > No, that's not the case. There is no memory, especially not system > > memory. We only allow partially present sections (sub-section memory > > hotplug) for ZONE_DEVICE. > > > > invalid memmap == memmap was not initialized == struct pages contains > > garbage. There is a memmap, but accessing it (e.g., pfn_to_nid()) will > > trigger a BUG. > > > > As long as the page structures exist, they should be initialized to some > known state. We could set PagePoison for those invalid memmap. It is the Sounds like you want to run page_init_poison() by default. > garbage that are in those page structures that can cause problem if a > struct page walker scan those pages and try to make sense of it.