On 1/11/24 10:03 AM, Matthew Wilcox wrote:
On Thu, Jan 11, 2024 at 09:51:47AM -0800, Sidhartha Kumar wrote:
On 1/11/24 9:34 AM, Jiaqi Yan wrote:
- if (!folio_test_has_hwpoisoned(folio))
+ if (!folio_test_hwpoison(folio))
Sidhartha, just curious why this change is needed? Does
PageHasHWPoisoned change after commit
"a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3"?
No its not an issue PageHasHWPoisoned(), the original code is testing for
the wrong flag and I realized that has_hwpoison and hwpoison are two
different flags. The memory-failure code calls folio_test_set_hwpoison() to
set the hwpoison flag and does not set the has_hwpoison flag. When
debugging, I realized this if statement was never true despite the code
hitting folio_test_set_hwpoison(). Now we are testing the correct flag.
From page-flags.h
#ifdef CONFIG_MEMORY_FAILURE
PG_hwpoison, /* hardware poisoned page. Don't touch */
#endif
folio_test_hwpoison() checks this flag ^^^
/* At least one page in this folio has the hwpoison flag set */
PG_has_hwpoisoned = PG_error,
while folio_test_has_hwpoisoned() checks this flag ^^^
So what you're saying is that hugetlb behaves differently from THP
with how memory-failure sets the flags?
I think so, in memory_failure() THP goes through this path:
hpage = compound_head(p);
if (PageTransHuge(hpage)) {
/*
* The flag must be set after the refcount is bumped
* otherwise it may race with THP split.
* And the flag can't be set in get_hwpoison_page() since
* it is called by soft offline too and it is just called
* for !MF_COUNT_INCREASED. So here seems to be the best
* place.
*
* Don't need care about the above error handling paths for
* get_hwpoison_page() since they handle either free page
* or unhandlable page. The refcount is bumped iff the
* page is a valid handlable page.
*/
SetPageHasHWPoisoned(hpage);
which sets has_hwpoisoned flag while hugetlb goes through
folio_set_hugetlb_hwpoison() which calls folio_test_set_hwpoison().