On 8/24/2023 12:09 PM, Matthew Wilcox wrote:
On Thu, Aug 24, 2023 at 11:24:20AM -0700, Jane Chu wrote:
On 8/22/2023 4:13 PM, Matthew Wilcox (Oracle) wrote:
[..]
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a6c3af985554..b81d6eb4e6ff 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1717,16 +1717,11 @@ static int mf_generic_kill_procs(unsigned long long pfn, int flags,
struct dev_pagemap *pgmap)
{
struct page *page = pfn_to_page(pfn);
Looks like the above line, that is, the 'page' pointer is no longer needed.
So ...
It seems to me that currently handling of hwpoison for DAX memory is
handled on a per-allocation basis but it should probably be handled
on a per-page basis eventually?
My recollection is that since the inception of
memory_failure_dev_pagemap(), dax poison handling is kind of on per-page
basis in that, the .si_addr points to the subpage vaddr, though the
.si_lsb indicates the user mapping size.
If so, we'd want to do something like this ...
+++ b/mm/memory-failure.c
@@ -1755,7 +1755,9 @@ static int mf_generic_kill_procs(unsigned long long pfn, int flags,
* Use this flag as an indication that the dax page has been
* remapped UC to prevent speculative consumption of poison.
*/
- SetPageHWPoison(&folio->page);
+ SetPageHWPoison(page);
+ if (folio_test_large(folio))
+ folio_set_has_hwpoisoned(folio);
/*
* Unlike System-RAM there is no possibility to swap in a
@@ -1766,7 +1768,8 @@ static int mf_generic_kill_procs(unsigned long long pfn, int flags,
flags |= MF_ACTION_REQUIRED | MF_MUST_KILL;
collect_procs(&folio->page, &to_kill, true);
- unmap_and_kill(&to_kill, pfn, folio->mapping, folio->index, flags);
+ unmap_and_kill(&to_kill, pfn, folio->mapping,
+ folio->index + folio_page_idx(folio, page), flags);
unlock:
dax_unlock_folio(folio, cookie);
return rc;
This change make sense to me.
mf_generic_kill_procs() is the generic version if DAX-FS does not
register/provide dax_dev->holder_ops->notify_failure. Currently only
DAX-XFS does the registration and utilizes the specific version:
mf_dax_kill_procs().
But this is a change in current behaviour and I didn't want to think
through the implications of all of this. Would you like to take on this
project? ;-)
Sure, be happy to.
My vague plan for hwpoison in the memdesc world is that poison is always
handled on a per-page basis (by means of setting page->memdesc to a
hwpoison data structure). If the allocation contains multiple pages,
then we set a flag somewhere like the current has_hwpoisoned flag.
Could you elaborate on "by means of setting page->memdesc to a hwpoison
data structure" please?
As for the PG_has_hwpoisoned flag, I see one reference for now -
$ git grep folio_test_has_hwpoisoned
mm/shmem.c: folio_test_has_hwpoisoned(folio)))
$ git grep folio_clear_has_hwpoisoned
<none>
A dax thp page is a thp page that is potentially recoverable from
hwpoison(s). If a dax thp page has multiple hwpoisons, only when the
last poisoned subpage is recovered, could we call
folio_clear_has_hwpoisoned() - this implies keeping track of the number
of poisoned subpages per thp somehow, any thoughts on the best thing to
do? hmm, maybe add a field in pgmap? or add a query to the driver to
return whether there is any hwpoison in a given range?
thanks!
-jane