On 2022/5/12 20:59, David Hildenbrand wrote: > On 12.05.22 13:13, Miaohe Lin wrote: >> On 2022/5/12 15:28, David Hildenbrand wrote: >>>>>>> >>>>>>> Once the problematic DIMM would actually get unplugged, the memory block devices >>>>>>> would get removed as well. So when hotplugging a new DIMM in the same >>>>>>> location, we could online that memory again. >>>>>> >>>>>> What about PG_hwpoison flags? struct pages are also freed and reallocated >>>>>> in the actual DIMM replacement? >>>>> >>>>> Once memory is offline, the memmap is stale and is no longer >>>>> trustworthy. It gets reinitialize during memory onlining -- so any >>>>> previous PG_hwpoison is overridden at least there. In some setups, we >>>>> even poison the whole memmap via page_init_poison() during memory offlining. >>>>> >>>>> Apart from that, we should be freeing the memmap in all relevant cases >>>>> when removing memory. I remember there are a couple of corner cases, but >>>>> we don't really have to care about that. >>>> >>>> OK, so there seems no need to manipulate struct pages for hwpoison in >>>> all relevant cases. >>> >>> Right. When offlining a memory block, all we have to do is remember if >>> we stumbled over a hwpoisoned page and rememebr that inside the memory >>> block. Rejecting to online is then easy. >> >> BTW: How should we deal with the below race window: >> >> CPU A CPU B CPU C >> accessing page while hold page refcnt >> memory_failure happened on page >> offline_pages >> page can be offlined due to page refcnt >> is ignored when PG_hwpoison is set >> can still access page struct... >> >> Any in use page (with page refcnt incremented) might be offlined while its content, e.g. flags, private ..., can >> still be accessed if the above race happened. Is this possible? Or am I miss something? Any suggestion to fix it? >> I can't figure out a way yet. :( > > I assume you mean that test_pages_isolated() essentially only checks for > PageHWPoison() and doesn't care about the refcount? Yes, page refcount is ignored when PG_HWPoison is set. > > That part is very dodgy and it's part of my motivation to question that > whole handling in the first place. > > > In do_migrate_range(), there is a comment: > > " > HWPoison pages have elevated reference counts so the migration would > fail on them. It also doesn't make any sense to migrate them in the > first place. Still try to unmap such a page in case it is still mapped > (e.g. current hwpoison implementation doesn't unmap KSM pages but keep > the unmap as the catch all safety net). > " > > My assumption would be: if there are any unexpected references on a > hwpoison page, we must fail offlining. Ripping out the page might be > more harmful then just leaving it in place and failing offlining for the > time being. I tend to agree with this. :) > > > > I am no export on PageHWPoison(). Which guarantees do we have regarding > the page count? > > If we succeed in unmapping the page, there shouldn't be any references > from the page tables. We might still have GUP references to such pages, > and it would be fair enough to fail offlining. I remember we try > removing the page from the pagecache etc. to free up these references. > So which additional references do we have that the comment in offlining > code talks about? A single additional one from hwpoison code? IIRC, memory_failure will hold one extra page refcount. This refcount will be released in unpoison_memory. > > Once we figure that out, we might tweak test_pages_isolated() to also > consider the page count and not rip out random pages that are still > referenced in the system. > But there are some corner cases where PageHWPoison is set but page refcnt is not increased. So we couldn't detect the page refcount reliably now. :( Thanks!