On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. When looking up a CONT-PTE size hugetlb page by follow_page(), it will use pte_offset_map_lock() to get the pte lock for the CONT-PTE size hugetlb in follow_page_pte(). However this pte lock is incorrect for the CONT-PTE size hugetlb, since we should use mm->page_table_lock by huge_pte_lockptr(). That means the pte entry of the CONT-PTE size hugetlb under current pte lock is unstable in follow_page_pte(), we can continue to migrate or poison the pte entry of the CONT-PTE size hugetlb, which can cause some potential race issues, since the pte entry is unstable, and following pte_xxx() validation is also incorrect in follow_page_pte(), even though they are under the 'pte lock'. To fix this issue, we should validate if it is a CONT-PTE size VMA at first, and use huge_pte_lockptr() to get the correct pte lock and get the pte value by huge_ptep_get() to make the pte entry stable under the correct pte lock. Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> --- mm/gup.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 5aa7531..3b2fa86 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -534,8 +534,26 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, if (unlikely(pmd_bad(*pmd))) return no_page_table(vma, flags); - ptep = pte_offset_map_lock(mm, pmd, address, &ptl); - pte = *ptep; + /* + * Considering PTE level hugetlb, like continuous-PTE hugetlb on + * ARM64 architecture. + */ + if (is_vm_hugetlb_page(vma)) { + struct hstate *hstate = hstate_vma(vma); + unsigned long size = huge_page_size(hstate); + + ptep = huge_pte_offset(mm, address, size); + if (!ptep) + return no_page_table(vma, flags); + + ptl = huge_pte_lockptr(hstate, mm, ptep); + spin_lock(ptl); + pte = huge_ptep_get(ptep); + } else { + ptep = pte_offset_map_lock(mm, pmd, address, &ptl); + pte = *ptep; + } + if (!pte_present(pte)) { swp_entry_t entry; /* -- 1.8.3.1