Re: [PATCH v2] mm: make page_mapped_in_vma() hugetlb walk aware

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/24/2025 10:49 PM, Miaohe Lin wrote:

On 2025/2/25 5:14, Jane Chu wrote:
When a process consumes a UE in a page, the memory failure handler
attempts to collect information for a potential SIGBUS.
If the page is an anonymous page, page_mapped_in_vma(page, vma) is
invoked in order to
   1. retrieve the vaddr from the process' address space,
   2. verify that the vaddr is indeed mapped to the poisoned page,
where 'page' is the precise small page with UE.

It's been observed that when injecting poison to a non-head subpage
of an anonymous hugetlb page, no SIGBUS show up; while injecting to
the head page produces a SIGBUS. The casue is that, though hugetlb_walk()
returns a valid pmd entry (on x86), but check_pte() detects mismatch
between the head page per the pmd and the input subpage. Thus the vaddr
is considered not mapped to the subpage and the process is not collected
for SIGBUS purpose.  This is the calling stack
       collect_procs_anon
         page_mapped_in_vma
           page_vma_mapped_walk
             hugetlb_walk
               huge_pte_lock
                 check_pte

check_pte() header says that it
"check if [pvmw->pfn, @pvmw->pfn + @pvmw->nr_pages) is mapped at the @pvmw->pte"
but practically works only if pvmw->pfn is the head page pfn at pvmw->pte.
Hindsight acknowledging that some pvmw->pte could point to a hugepage of
some sort such that it makes sense to make check_pte() work for hugepage.
Thanks for your patch. This patch looks good to me.

Signed-off-by: Jane Chu <jane.chu@xxxxxxxxxx>
Is a Fixes tag needed?

I don't have a clear call and here is the reason.

Since the introduction of check_pte() by ace71a19cec5e ("mm: introduce page_vma_mapped_walk()"), it has carried the assumption that pvmw->page (later changed to pvmw->pfn) points to the head of a huge page or a small page and had been used in such way,  so that it doesn't really check whether a given subpage range falls within a huge leaf pte range.  When 376907f3a0b34 ("mm/memory-failure: pass the folio and the page to collect_procs()") came along, it sort of exposed the latent issue which hadn't been an issue before.

Thanks!

-jane


Thanks.
.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux