On Tue, 10 May 2022 11:45:59 +0800 Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote: > On some architectures (like ARM64), it can support CONT-PTE/PMD size > hugetlb, which means it can support not only PMD/PUD size hugetlb: > 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page > size specified. > > When migrating a hugetlb page, we will get the relevant page table > entry by huge_pte_offset() only once to nuke it and remap it with > a migration pte entry. This is correct for PMD or PUD size hugetlb, > since they always contain only one pmd entry or pud entry in the > page table. > > However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, > since they can contain several continuous pte or pmd entry with > same page table attributes. So we will nuke or remap only one pte > or pmd entry for this CONT-PTE/PMD size hugetlb page, which is > not expected for hugetlb migration. The problem is we can still > continue to modify the subpages' data of a hugetlb page during > migrating a hugetlb page, which can cause a serious data consistent > issue, since we did not nuke the page table entry and set a > migration pte for the subpages of a hugetlb page. > > To fix this issue, we should change to use huge_ptep_clear_flush() > to nuke a hugetlb page table, and remap it with set_huge_pte_at() > and set_huge_swap_pte_at() when migrating a hugetlb page, which > already considered the CONT-PTE or CONT-PMD size hugetlb. > > ... > > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -1093,6 +1093,17 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr > pte_t *ptep, pte_t pte, unsigned long sz) > { > } > + > +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, > + unsigned long addr, pte_t *ptep) > +{ > + return ptep_get(ptep); > +} > + > +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, > + pte_t *ptep, pte_t pte) > +{ > +} > #endif /* CONFIG_HUGETLB_PAGE */ > This blows up nommu (arm allnoconfig): In file included from fs/io_uring.c:71: ./include/linux/hugetlb.h: In function 'huge_ptep_clear_flush': ./include/linux/hugetlb.h:1100:16: error: implicit declaration of function 'ptep_get' [-Werror=implicit-function-declaration] 1100 | return ptep_get(ptep); | ^~~~~~~~ huge_ptep_clear_flush() is only used in CONFIG_NOMMU=n files, so I simply zapped this change. --- a/include/linux/hugetlb.h~mm-rmap-fix-cont-pte-pmd-size-hugetlb-issue-when-migration-fix +++ a/include/linux/hugetlb.h @@ -1093,17 +1093,6 @@ static inline void set_huge_swap_pte_at( pte_t *ptep, pte_t pte, unsigned long sz) { } - -static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep) -{ - return ptep_get(ptep); -} - -static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte) -{ -} #endif /* CONFIG_HUGETLB_PAGE */ static inline spinlock_t *huge_pte_lock(struct hstate *h, _