On Thu, Nov 17, 2022 at 08:10:15PM -0500, Peter Xu wrote: > In hugetlb_fault(), there used to have a special path to handle swap entry > at the entrance using huge_pte_offset(). That's unsafe because > huge_pte_offset() for a pmd sharable range can access freed pgtables if > without either the walker lock or vma lock. > > Here the simplest solution for making it safe is just to move the swap > handling to be after the vma lock being held. We may need to take the > fault mutex on either migration or hwpoison entries now (also the vma lock, > but that's really needed), however neither of them is hot path so it should > be fine. > > Signed-off-by: Peter Xu <peterx@xxxxxxxxxx> > --- > mm/hugetlb.c | 24 +++++++----------------- > 1 file changed, 7 insertions(+), 17 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index c3aab6d5b7aa..62ff3fc51d4e 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5824,22 +5824,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > int need_wait_lock = 0; > unsigned long haddr = address & huge_page_mask(h); > > - ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); > - if (ptep) { > - /* > - * Since we hold no locks, ptep could be stale. That is > - * OK as we are only making decisions based on content and > - * not actually modifying content here. > - */ > - entry = huge_ptep_get(ptep); > - if (unlikely(is_hugetlb_entry_migration(entry))) { > - migration_entry_wait_huge(vma, ptep); > - return 0; > - } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) > - return VM_FAULT_HWPOISON_LARGE | > - VM_FAULT_SET_HINDEX(hstate_index(h)); > - } > - > /* > * Serialize hugepage allocation and instantiation, so that we don't > * get spurious allocation failures if two CPUs race to instantiate > @@ -5886,8 +5870,14 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > * fault, and is_hugetlb_entry_(migration|hwpoisoned) check will > * properly handle it. > */ > - if (!pte_present(entry)) > + if (!pte_present(entry)) { > + if (unlikely(is_hugetlb_entry_migration(entry))) > + migration_entry_wait_huge(vma, ptep); Hmm no, need to release the vma lock and fault mutex.. So I remembered why I had a note that I need to rework migration wait code.. I'll try that on next version, it would be a callback just to release the proper locks in migration_entry_wait_huge() right after releasing the pgtable lock, in e.g. migration_entry_wait_on_locked(). > + else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) > + ret = VM_FAULT_HWPOISON_LARGE | > + VM_FAULT_SET_HINDEX(hstate_index(h)); > goto out_mutex; > + } > > /* > * If we are going to COW/unshare the mapping later, we examine the > -- > 2.37.3 > -- Peter Xu