>>>> >>>> Once the problematic DIMM would actually get unplugged, the memory block devices >>>> would get removed as well. So when hotplugging a new DIMM in the same >>>> location, we could online that memory again. >>> >>> What about PG_hwpoison flags? struct pages are also freed and reallocated >>> in the actual DIMM replacement? >> >> Once memory is offline, the memmap is stale and is no longer >> trustworthy. It gets reinitialize during memory onlining -- so any >> previous PG_hwpoison is overridden at least there. In some setups, we >> even poison the whole memmap via page_init_poison() during memory offlining. >> >> Apart from that, we should be freeing the memmap in all relevant cases >> when removing memory. I remember there are a couple of corner cases, but >> we don't really have to care about that. > > OK, so there seems no need to manipulate struct pages for hwpoison in > all relevant cases. Right. When offlining a memory block, all we have to do is remember if we stumbled over a hwpoisoned page and rememebr that inside the memory block. Rejecting to online is then easy. -- Thanks, David / dhildenb