On 05.01.21 08:50, Michal Hocko wrote: > On Mon 04-01-21 21:17:43, Dan Williams wrote: >> On Mon, Jan 4, 2021 at 2:45 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > [...] >>> I believe Dan mentioned somewhere that he wants to see a real instance >>> of this producing a BUG before actually moving forward with a fix. I >>> might be wrong. >> >> I think I'm missing an argument for the user-visible effects of the >> "Bad." statements above. I think soft_offline_page() is a candidate >> for a local fix because mm/memory-failure.c already has a significant >> amount of page-type specific knowledge. So teaching it "yes" for >> MEMORY_DEVICE_PRIVATE-ZONE_DEVICE and "no" for other ZONE_DEVICE seems >> ok to me. > > I believe we do not want to teach _every_ pfn walker about zone device > pages. This would be quite error prone. Especially when a missig check > could lead to a silently broken data or BUG_ON with debugging enabled > (which is not the case for many production users). Or are we talking > about different bugs here? I'd like us to stick to the documentation, e.g., include/linux/mmzone.h " pfn_valid() is meant to be able to tell if a given PFN has valid memmap associated with it or not. This means that a struct page exists for this pfn. The caller cannot assume the page is fully initialized in general. Hotplugable pages might not have been onlined yet. pfn_to_online_page() will ensure the struct page is fully online and initialized. Special pages (e.g. ZONE_DEVICE) are never onlined and should be treated accordingly. " -- Thanks, David / dhildenb