On Fri, Apr 26, 2024 at 10:57:31AM -0700, Sidhartha Kumar wrote: > On 4/26/24 10:34 AM, Matthew Wilcox wrote: > > On Fri, Apr 26, 2024 at 10:15:11AM -0700, Sidhartha Kumar wrote: > > > Use a folio in get_any_page() to save 5 calls to compound head and > > > convert the last user of shake_page() to shake_folio(). This allows us > > > to remove the shake_page() definition. > > > > So I didn't do this before because I wasn't convinced it was safe. > > We don't have a refcount on the folio, so the page might no longer > > be part of this folio by the time we get the refcount on the folio. > > > > I'd really like to see some argumentation for why this is safe. > > If I moved down the folio = page_folio() line to after we verify > __get_hwpoison_page() has returned 1, which indicates the reference count > was successfully incremented via foliO_try_get(), that means the folio > conversion would happen after we have a refcount. In the case we don't call > __get_hwpoison_page(), that means the MF_COUNT_INCREASED flag is set. This > means the page has existing users so that path would be safe as well. So I > think this is safe after moving page_folio() after __get_hwpoison_page(). See if you can find a hole in this chain of reasoning ... memory_failure() p = pfn_to_online_page(pfn); res = try_memory_failure_hugetlb(pfn, flags, &hugetlb); (not a hugetlb) if (TestSetPageHWPoison(p)) { (not already poisoned) if (!(flags & MF_COUNT_INCREASED)) { res = get_hwpoison_page(p, flags); get_hwpoison_page() ret = get_any_page(p, flags); get_any_page() folio = page_folio(page) Because we don't have a reference on the folio at this point (how could we?), the folio might be split, and now we have a pointer to a folio which no longer contains the page (assuming we had a hwerror in what was a tail page at this time).