On 11/15/21 00:40, Naoya Horiguchi wrote: > From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > > Originally mf_mutex is introduced to serialize multiple MCE events, but > it is not that useful to allow unpoison to run in parallel with memory_failure() > and soft offline. So apply mf_mutex to soft offline and unpoison. > The memory failure handler and soft offline handler get simpler with this. > > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > Reviewed-by: Yang Shi <shy828301@xxxxxxxxx> > --- > ChangeLog v4: > - fix type in commit description. > > ChangeLog v3: > - merge with "mm/hwpoison: remove race consideration" > - update description > > ChangeLog v2: > - add mutex_unlock() in "page already poisoned" path in soft_offline_page(). > (Thanks to Ding Hui) > --- > mm/memory-failure.c | 62 +++++++++++++-------------------------------- > 1 file changed, 18 insertions(+), 44 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index e8c38e27b753..d29c79de6034 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c Thanks for working on this. I tried to exercise memory error handling for hugetlb pages and ran into issues addressed by these patches. > @@ -1507,14 +1507,6 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) > lock_page(head); > page_flags = head->flags; > > - if (!PageHWPoison(head)) { > - pr_err("Memory failure: %#lx: just unpoisoned\n", pfn); > - num_poisoned_pages_dec(); > - unlock_page(head); > - put_page(head); > - return 0; > - } > - > /* > * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so > * simply disable it. In order to make it work properly, we need > @@ -1628,6 +1620,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > return rc; > } > > +static DEFINE_MUTEX(mf_mutex); There are only two places other places where PageHWPoison is modified without the mutex. They are: - test_and_clear_pmem_poison I 'think' pmem error handling is done separately so this does not apply. - clear_hwpoisoned_pages Called before removing memory (and deleting memmap) to reconcile count of poisoned pages. Should not be an issue and technically I do not think the ClearPageHWPoison() is actually needed in this routine. Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> -- Mike Kravetz