On 5/22/2024 1:37 PM, Oscar Salvador wrote:
On Tue, May 21, 2024 at 05:54:27PM -0600, Jane Chu wrote:
Added two explicit MF_MSG messages describing failure in get_hwpoison_page.
Attemped to document the definition of various action names, and made a few
adjustment to the action_result() calls.
Signed-off-by: Jane Chu <jane.chu@xxxxxxxxxx>
This looks much better, thanks:
Reviewed-by: Oscar Salvador <osalvador@xxxxxxx>
By the way, I was checking the block in memory_failure() that handles
refcount=0 pages, concretely the piece of code that handles buddy pages.
In there, if we fail to take the page off the buddy lists, we return
MF_FAILED, but I really think we should be returning MF_IGNORED.
I guess you mean this code -
if (has_extra_refcount(ps, p, false))
ret = MF_FAILED;
?
It appears in below code paths-
hwpoison_user_mappings
identify_page_state
me_huge_page || me_swapcache_dirty || me_swapcache_clean
for LRU pages.
And for non-LRU
if (!folio_test_lru(folio) && !folio_test_writeback(folio))
goto identify_page_state;
My hunch is that the most common calling path would be:
hwpoison_user_mappings has unmapped the page, then identify_page_state
is called, but for some reason failed to take the page off the LRU. The
m-f() handler has isolated the page to avoid further MCE, so I think in
general return MF_FAILED is okay.
That said, the line is not always clear, for example in the non-LRU
case, where the m-f() handler may have done only a little, I guess I
just need to let the case rest.
thanks,
-jane
Thoughts?