On Tue, Nov 23, 2021 at 4:32 PM Yang Shi <shy828301@xxxxxxxxx> wrote: > > On Tue, Nov 23, 2021 at 4:11 PM HORIGUCHI NAOYA(堀口 直也) > <naoya.horiguchi@xxxxxxx> wrote: > > > > On Mon, Nov 22, 2021 at 11:28:10AM -0800, Yang Shi wrote: > > > On Sat, Nov 20, 2021 at 9:44 AM Matthew Wilcox (Oracle) > > > <willy@xxxxxxxxxxxxx> wrote: > > > > > > > > Pages are individually marked as suffering from hardware poisoning. > > > > Checking that the head page is not hardware poisoned doesn't make > > > > sense; we might be after a subpage. We check each page individually > > > > before we use it, so this was an optimisation gone wrong. > > > > > > Yeah, it doesn't make too much sense to check the head page. And it > > > seems the non-poisoned subpages could be PTE mapped instead of > > > skipping the whole THP. > > > > > > Not sure if this is by design, it seems the hwpoisoned check in > > > filemap_map_pages() does skip the subpages after the poisoned page. Or > > > we should just skip the poisoned page itself? If so the below change > > > may be needed: > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > index daa0e23a6ee6..f1f0cb263b4a 100644 > > > --- a/mm/filemap.c > > > +++ b/mm/filemap.c > > > @@ -3318,7 +3318,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, > > > do { > > > page = find_subpage(head, xas.xa_index); > > > if (PageHWPoison(page)) > > > - goto unlock; > > > + goto skip; > > > > > > if (mmap_miss > 0) > > > mmap_miss--; > > > @@ -3337,6 +3337,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, > > > do_set_pte(vmf, page, addr); > > > /* no need to invalidate: a not-present page won't be cached */ > > > update_mmu_cache(vma, addr, vmf->pte); > > > +skip: > > > unlock_page(head); > > > continue; > > > unlock: > > > > first_map_page() or next_map_page() returns a page (if found) with > > holding the refcount, and the new 'goto skip' path skips releasing it. > > So this looks to me lead to the mismatch of refcount. > > Could you explain the intention a little more (maybe related to your > > recent patch about keeping hwpoison page in pagecache?) ? > > No, not related to my patches. > > The current code maps the subpages by PTEs *before* the poisoned page, > but skips the subpages *after* the poisoned page IIUC. It seems not > right, I thought the code was intended to map all subpages by PTEs > except the poisoned pages. So the suggested code is trying to fix the > misbehavior. Err... I think I misread the code. It does iterate every subpages to map each of them by PTE and just skip the hwpoisoned subpages. Sorry for the confusion. > > That code is just a quick and untested illustration to the above > hypothesis. The corrected version: > > diff --git a/mm/filemap.c b/mm/filemap.c > index daa0e23a6ee6..1a76e3edc878 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -3317,8 +3317,11 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, > vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl); > do { > page = find_subpage(head, xas.xa_index); > - if (PageHWPoison(page)) > - goto unlock; > + if (PageHWPoison(page)) { > + unlock_page(page); > + put_page(page); > + continue; > + } > > if (mmap_miss > 0) > mmap_miss--; > > > > > Thanks, > > Naoya Horiguchi > > > > > > > > > > > > > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> > > > > --- > > > > mm/filemap.c | 2 -- > > > > 1 file changed, 2 deletions(-) > > > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > > index 0b6f996108b4..65973204112d 100644 > > > > --- a/mm/filemap.c > > > > +++ b/mm/filemap.c > > > > @@ -3239,8 +3239,6 @@ static struct page *next_uptodate_page(struct page *page, > > > > goto skip; > > > > if (!PageUptodate(page) || PageReadahead(page)) > > > > goto skip; > > > > - if (PageHWPoison(page)) > > > > - goto skip; > > > > if (!trylock_page(page)) > > > > goto skip; > > > > if (page->mapping != mapping) > > > > -- > > > > 2.33.0 > > > >