On Wed, Oct 06, 2021 at 04:57:38PM -0700, Yang Shi wrote: > > For example, I see that both unpoison_memory() and soft_offline_page() will > > call it too, does it mean that we'll also set the bits e.g. even when we want > > to inject an unpoison event too? > > unpoison_memory() should be not a problem since it will just bail out > once THP is met as the comment says: > > /* > * unpoison_memory() can encounter thp only when the thp is being > * worked by memory_failure() and the page lock is not held yet. > * In such case, we yield to memory_failure() and make unpoison fail. > */ But I still think setting the subpage-hwpoison bit hides too deep there, it'll be great we can keep get_hwpoison_page() as simple as a safe version of getting the refcount of the page we want. Or we'd still better touch up the comment above get_hwpoison_page() to show that side effect. > > > And I think we should set the flag for soft offline too, right? The I'm not familiar with either memory failure or soft offline, so far it looks right to me. However.. > soft offline does set the hwpoison flag for the corrupted sub page and > doesn't split file THP, .. I believe this will become not true after your patch 5, right? > so it should be captured by page fault as well. And yes for poison injection. One more thing: besides thp split and page free, do we need to conditionally drop the HasHwpoisoned bit when received an unpoison event? If my understanding is correct, we may need to scan all the subpages there, to make sure HasHwpoisoned bit reflects the latest status for the thp in question. > > But your comment reminds me that get_hwpoison_page() is just called > when !MF_COUNT_INCREASED, so it means MADV_HWPOISON still could > escape. This needs to be covered too. Right, maybe that's also a clue that we shouldn't set the new page flag within get_hwpoison_page(), since get_hwpoison_page() is actually well coupled with MF_COUNT_INCREASED and all of them are only about refcounting of the pages. -- Peter Xu