On 2/1/22 17:47, Matthew Wilcox wrote: > On Tue, Feb 01, 2022 at 05:40:57PM +0000, Joao Martins wrote: >> On 2/1/22 16:01, Joao Martins wrote: >>> On 2/1/22 15:46, Matthew Wilcox wrote: >>>> On Mon, Jan 31, 2022 at 08:54:39PM +0000, Joao Martins wrote: >>>>> On 1/31/22 20:29, Matthew Wilcox wrote: >>>>>> Unless I am mistaken, you have to pass the compound head of the page >>>>>> which has the error to collect_procs(). Am I mistaken? >>>>>> >>>>> -rc2 already has a fix for it: >>>>> >>>>> https://lore.kernel.org/linux-mm/20220129021420.PgBIZm-q9%25akpm@xxxxxxxxxxxxxxxxxxxx/ >>>>> >>>>> Earlier in that function there's a: >>>>> >>>>> page = compound_head(page); >>>>> >>>>> So the @page passed to collect_procs() already is a head page. >>>> >>>> It's wrong though ;-( You set the HWPoison bit on the page after >>>> calling compound_head(), so you set the bit on the head page instead >>>> of the precise page that had the poison. >>>> >>> Considering that on device-dax we would unmap the whole 2M page regardless >>> of the poisoned subpage isn't that actually representative still? >>> >> >> To say this another way. We do set the HWPoison on the head page, >> and not the subpage as you say, but we end up propagating the resultant >> MCE action on the superset of pages in the whole PMD or PUD. >> >> What I was trying to say in perhaps a convoluted way is that device-dax >> case isn't different than HugeTLB that only wants to poison head. If there's >> a head page, there's likely a PMD or PUD populated (depending what the device >> was onlined with) and thus that's what gets unmapped. There's no idea of >> subpages being treated any differently, at least as far as device-dax is >> concerned -- unless I miss auditing some other code path. > > Using HugeTLB as a model is not a good idea. THP is the model to > aim for; one can choose to map pages askew (ie not aligned with a > PMD), and I don't think you'll find all the mappings with the current > code (eg if someone has mapped a single page of the hugepage). > Don't know if that last sentence referred to THP or DAX -- but to be extra sure, on device-dax current code you can't map or remap subpages of a hugepage. Even if you open a device-dax fd and touch just a subpage, what gets mapped in the user mm is the entirety of the PMD (or PUD). You can't even map a single random page, it always needs to be aligned /at a minimum/ to the namespace boundary (which is the page table granularity i.e. PTE, PMD or PUD). >> fsdax IIUC seems to rely more on the subpage bit being flagged but no >> functional change here for fsdax as there's only base pages there (no heads). >> >>>> I'm fixing this up as part of the folio patches, but you may wish to >>>> fix it earlier than that. >> (+Dan in case I misrepresented or missed something) >> >> Should we deem it a problem, I'll fix for the next -rc. >> Just in case, here's the diff stashed: >> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index 2e2f740c63dc..661c23df8115 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -1577,7 +1577,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) >> static int memory_failure_dev_pagemap(unsigned long pfn, int flags, >> struct dev_pagemap *pgmap) >> { >> - struct page *page = pfn_to_page(pfn); >> + struct page *page = pfn_to_page(pfn), *subpage = page; >> unsigned long size = 0; >> struct to_kill *tk; >> LIST_HEAD(tokill); >> @@ -1631,7 +1631,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, >> * Use this flag as an indication that the dax page has been >> * remapped UC to prevent speculative consumption of poison. >> */ >> - SetPageHWPoison(page); >> + SetPageHWPoison(subpage); >> >> /* >> * Unlike System-RAM there is no possibility to swap in a