On Fri, Apr 26, 2024 at 11:53:01AM -0700, Sidhartha Kumar wrote: > On 4/26/24 11:27 AM, Matthew Wilcox wrote: > > On Fri, Apr 26, 2024 at 10:57:31AM -0700, Sidhartha Kumar wrote: > > > On 4/26/24 10:34 AM, Matthew Wilcox wrote: > > > > On Fri, Apr 26, 2024 at 10:15:11AM -0700, Sidhartha Kumar wrote: > > > > > Use a folio in get_any_page() to save 5 calls to compound head and > > > > > convert the last user of shake_page() to shake_folio(). This allows us > > > > > to remove the shake_page() definition. > > > > > > > > So I didn't do this before because I wasn't convinced it was safe. > > > > We don't have a refcount on the folio, so the page might no longer > > > > be part of this folio by the time we get the refcount on the folio. > > > > > > > > I'd really like to see some argumentation for why this is safe. > > > > > > If I moved down the folio = page_folio() line to after we verify > > > __get_hwpoison_page() has returned 1, which indicates the reference count > > > was successfully incremented via foliO_try_get(), that means the folio > > > conversion would happen after we have a refcount. In the case we don't call > > > __get_hwpoison_page(), that means the MF_COUNT_INCREASED flag is set. This > > > means the page has existing users so that path would be safe as well. So I > > > think this is safe after moving page_folio() after __get_hwpoison_page(). > > > > See if you can find a hole in this chain of reasoning ... > > > > memory_failure() > > p = pfn_to_online_page(pfn); > > res = try_memory_failure_hugetlb(pfn, flags, &hugetlb); > > (not a hugetlb) > > if (TestSetPageHWPoison(p)) { > > (not already poisoned) > > if (!(flags & MF_COUNT_INCREASED)) { > > res = get_hwpoison_page(p, flags); > > > > get_hwpoison_page() > > ret = get_any_page(p, flags); > > > > get_any_page() > > folio = page_folio(page) > > That would be unsafe, the safe way would be if we moved page_folio() after > the call to __get_hw_poison() in get_any_page() and there would still be one > remaining user of shake_page() that we can't convert. A safe version of this > patch would result in a removal of one use of PageHuge() and two uses of > put_page(), would that be worth submitting? > > get_any_page() > if(__get_hwpoison_page()) > folio = page_folio() /* folio_try_get() returned 1, safe */ I think we should convert __get_hwpoison_page() to return either the folio or an ERR_PTR or NULL. Also, I think we should delete the "cannot catch tail" part and just loop in __get_hwpoison_page() until we do catch it. See try_get_folio() in mm/gup.c for inspiration (although you can't use it exactly because that code knows that the page is mapped into a page table, so has a refcount). But that's just an immediate assessment; you might find a reason that doesn't work.