>On 9/20/19 8:51 AM, Qiujun Huang wrote: >> __get_user_pages_fast try to walk the page table but the >> hugepage pte is replace by hwpoison swap entry by mca path. > >I expect you mean MCE (machine check exception), rather than mca? Yeah > >> ... >> [15798.177437] mce: Uncorrected hardware memory error in >> user-access at 224f1761c0 >> [15798.180171] MCE 0x224f176: Killing pal_main:6784 due to >> hardware memory corruption >> [15798.180176] MCE 0x224f176: Killing qemu-system-x86:167336 >> due to hardware memory corruption >> ... >> [15798.180206] BUG: unable to handle kernel >> [15798.180226] paging request at ffff891200003000 >> [15798.180236] IP: [<ffffffff8106edae>] gup_pud_range+ >> 0x13e/0x1e0 >> ... >> >> We need to skip the hwpoison entry in gup_pud_range. > >It would be nice if this spelled out a little more clearly what's >wrong. I think you and Aneesh are saying that the entry is really >a swap entry, created by the MCE response to a bad page? do_machine_check-> do_memory_failure-> memory_failure-> hwpoison_user_mappings will updated PUD level PTE entry as a swap entry. static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, unsigned long address, void *arg) { ... if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (PageHuge(page)) { int nr = 1 << compound_order(page); hugetlb_count_sub(nr, mm); set_huge_swap_pte_at(mm, address, pvmw.pte, pteval, vma_mmu_pagesize(vma)); } else { dec_mm_counter(mm, mm_counter(page)); set_pte_at(mm, address, pvmw.pte, pteval); } ... and, gup_pud_range will reference the pud entry. gup_pud_range->gup_pmd_range: static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end, int write, struct page **pages, int *nr) { unsigned long next; pmd_t *pmdp; pmdp = pmd_offset(&pud, addr); do { pmd_t pmd = *pmdp; <--the pmdp is hwpoison swap entry. ffff891200003000 and results in corruption ... > >> >> Signed-off-by: Qiujun Huang <hqjagain@xxxxxxxxx> >> --- >> mm/gup.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/mm/gup.c b/mm/gup.c >> index 98f13ab..6157ed9 100644 >> --- a/mm/gup.c >> +++ b/mm/gup.c >> @@ -2230,6 +2230,8 @@ static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, >> next = pud_addr_end(addr, end); >> if (pud_none(pud)) >> return 0; >> + if (unlikely(!pud_present(pud))) >> + return 0; > >If the MCE hwpoison behavior puts in swap entries, then it seems like all >page table walkers would need to check for p*d_present(), and maybe at all >levels too, right? I think so > >thanks, On Sat, Sep 21, 2019 at 3:37 AM John Hubbard <jhubbard@xxxxxxxxxx> wrote: > > On 9/20/19 8:51 AM, Qiujun Huang wrote: > > __get_user_pages_fast try to walk the page table but the > > hugepage pte is replace by hwpoison swap entry by mca path. > > I expect you mean MCE (machine check exception), rather than mca? > > > ... > > [15798.177437] mce: Uncorrected hardware memory error in > > user-access at 224f1761c0 > > [15798.180171] MCE 0x224f176: Killing pal_main:6784 due to > > hardware memory corruption > > [15798.180176] MCE 0x224f176: Killing qemu-system-x86:167336 > > due to hardware memory corruption > > ... > > [15798.180206] BUG: unable to handle kernel > > [15798.180226] paging request at ffff891200003000 > > [15798.180236] IP: [<ffffffff8106edae>] gup_pud_range+ > > 0x13e/0x1e0 > > ... > > > > We need to skip the hwpoison entry in gup_pud_range. > > It would be nice if this spelled out a little more clearly what's > wrong. I think you and Aneesh are saying that the entry is really > a swap entry, created by the MCE response to a bad page? > > > > > Signed-off-by: Qiujun Huang <hqjagain@xxxxxxxxx> > > --- > > mm/gup.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/mm/gup.c b/mm/gup.c > > index 98f13ab..6157ed9 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -2230,6 +2230,8 @@ static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, > > next = pud_addr_end(addr, end); > > if (pud_none(pud)) > > return 0; > > + if (unlikely(!pud_present(pud))) > > + return 0; > > If the MCE hwpoison behavior puts in swap entries, then it seems like all > page table walkers would need to check for p*d_present(), and maybe at all > levels too, right? > > thanks, > -- > John Hubbard > NVIDIA > > > > if (unlikely(pud_huge(pud))) { > > if (!gup_huge_pud(pud, pudp, addr, next, flags, > > pages, nr)) > >