On Thu, Apr 07, 2022 at 09:38:26PM +0800, Miaohe Lin wrote: > On 2022/4/7 19:29, Naoya Horiguchi wrote: ... > > +int __get_huge_page_for_hwpoison(unsigned long pfn, int flags) > > +{ > > + struct page *page = pfn_to_page(pfn); > > + struct page *head = compound_head(page); > > + int ret = 2; /* fallback to normal page handling */ > > + bool count_increased = false; > > + > > + if (!PageHeadHuge(head)) > > + goto out; > > + > > + if (flags & MF_COUNT_INCREASED) { > > + ret = 1; > > + count_increased = true; > > + } else if (HPageFreed(head) || HPageMigratable(head)) { > > + ret = get_page_unless_zero(head); > > + if (ret) > > + count_increased = true; > > + } else { > > + ret = -EBUSY; > > + goto out; > > + } > > + > > + if (hwpoison_filter(page)) { > > + ret = -EOPNOTSUPP; > > + goto out; > > + } > > Now hwpoison_filter is done without lock_page + unlock_page. Is this ok or > lock_page + unlock_page pair is indeed required? Hmm, we had better call hwpoison_filter in page lock for hugepages. I'll move this too, thank you. > > + > > + if (TestSetPageHWPoison(head)) { > > + ret = -EHWPOISON; > > + goto out; > > + } > > Without this patch, page refcnt is not decremented if MF_COUNT_INCREASED is set in flags > when PageHWPoison is already set. So I think this patch also fixes that issue. Thanks! Good point, I even didn't notice that. And the issue still seems to exist for normal page's cases. Maybe encountering "already hwpoisoned" case from madvise_inject_error() is rare but could happen when the first call failed to contain the error (which is still accessible from the calling process). > > > + > > + return ret; > > +out: > > + if (count_increased) > > + put_page(head); > > + return ret; > > +} > > + > > +#ifdef CONFIG_HUGETLB_PAGE > > +/* > > + * Taking refcount of hugetlb pages needs extra care about race conditions > > + * with basic operations like hugepage allocation/free/demotion. > > + * So all necessary prechecks for hwpoison (like pinning, testing/setting > > + * PageHWPoison, and hwpoison_filter) are done in single hugetlb_lock range. > > + */ > > +static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb) > > { > > - struct page *p = pfn_to_page(pfn); > > - struct page *head = compound_head(p); > > int res; > > + struct page *p = pfn_to_page(pfn); > > + struct page *head; > > unsigned long page_flags; > > + bool retry = true; > > > > - if (TestSetPageHWPoison(head)) { > > - pr_err("Memory failure: %#lx: already hardware poisoned\n", > > - pfn); > > - res = -EHWPOISON; > > - if (flags & MF_ACTION_REQUIRED) > > + *hugetlb = 1; > > +retry: > > + res = get_huge_page_for_hwpoison(pfn, flags); > > + if (res == 2) { /* fallback to normal page handling */ > > + *hugetlb = 0; > > + return 0; > > + } else if (res == -EOPNOTSUPP) { > > + return res; > > + } else if (res == -EHWPOISON) { > > + pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); > > + if (flags & MF_ACTION_REQUIRED) { > > + head = compound_head(p); > > res = kill_accessing_process(current, page_to_pfn(head), flags); > > + } > > + return res; > > + } else if (res == -EBUSY) { > > + if (retry) { > > + retry = false; > > + goto retry; > > + } > > + action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED); > > return res; > > } > > > > num_poisoned_pages_inc(); > > > > - if (!(flags & MF_COUNT_INCREASED)) { > > - res = get_hwpoison_page(p, flags); > > - if (!res) { > > - lock_page(head); > > - if (hwpoison_filter(p)) { > > - if (TestClearPageHWPoison(head)) > > - num_poisoned_pages_dec(); > > - unlock_page(head); > > - return -EOPNOTSUPP; > > - } > > - unlock_page(head); > > - res = MF_FAILED; > > - if (__page_handle_poison(p)) { > > - page_ref_inc(p); > > - res = MF_RECOVERED; > > - } > > - action_result(pfn, MF_MSG_FREE_HUGE, res); > > - return res == MF_RECOVERED ? 0 : -EBUSY; > > - } else if (res < 0) { > > - action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED); > > - return -EBUSY; > > + /* > > + * Handling free hugepage. The possible race with hugepage allocation > > + * or demotion can be prevented by PageHWPoison flag. > > + */ > > + if (res == 0) { > > + res = MF_FAILED; > > + if (__page_handle_poison(p)) { > > + page_ref_inc(p); > > + res = MF_RECOVERED; > > } > > + action_result(pfn, MF_MSG_FREE_HUGE, res); > > + return res == MF_RECOVERED ? 0 : -EBUSY; > > } > > > > + head = compound_head(p); > > lock_page(head); > > > > /* > > IMHO, the below code could be removed now as we fetch the refcnt under the hugetlb_lock: > > /* > * The page could have changed compound pages due to race window. > * If this happens just bail out. > */ > if (!PageHuge(p) || compound_head(p) != head) { > action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED); > res = -EBUSY; > goto out; > } > > But this might be another patch. I'll do this. Thank you for the review and suggestions, - Naoya Horiguchi