On Sun, 27 Oct 2019 23:45:52 +0100 David Hildenbrand <david@xxxxxxxxxx> wrote: > I think I just found an issue with try_offline_node(). > try_offline_node() is pretty much broken already (touches garbage > memmaps and will not considers mixed NIDs within sections), however, > relies on the node span to look for memory sections to probe. So it > seems to rely on the nodes getting shrunk when removing memory, not when > offlining. > > As we shrink the node span when offlining now and not when removing, > this can go wrong once we offline the last memory block of the node and > offline the last CPU. We could still have memory around that we could > re-online, however, the node would already be offline. Unlikely, but > possible. > > Note that the same is also broken without this patch in case memory is > never onlined. The "pfn_to_nid(pfn) != nid" can easily succeed on the > garbage memmap, resulting in no memory being detected as belonging to > the node. Also, resize_pgdat_range() is called when onlining memory, not > when adding it. :/ Oh this is so broken :) > > The right fix is probably to walk over all memory blocks that could > exist and test if they belong to the nid (if offline, check the > block->nid, if online check all pageblocks). A fix we can then move in > front of this patch. > > Will look into this this week. And this series shows almost no sign of having been reviewed. I'll hold it over for 5.6.