On 21/02/2025 15:31, Will Deacon wrote: > On Mon, Feb 17, 2025 at 02:04:15PM +0000, Ryan Roberts wrote: >> arm64 supports multiple huge_pte sizes. Some of the sizes are covered by >> a single pte entry at a particular level (PMD_SIZE, PUD_SIZE), and some >> are covered by multiple ptes at a particular level (CONT_PTE_SIZE, >> CONT_PMD_SIZE). So the function has to figure out the size from the >> huge_pte pointer. This was previously done by walking the pgtable to >> determine the level and by using the PTE_CONT bit to determine the >> number of ptes at the level. >> >> But the PTE_CONT bit is only valid when the pte is present. For >> non-present pte values (e.g. markers, migration entries), the previous >> implementation was therefore erroniously determining the size. There is >> at least one known caller in core-mm, move_huge_pte(), which may call >> huge_ptep_get_and_clear() for a non-present pte. So we must be robust to >> this case. Additionally the "regular" ptep_get_and_clear() is robust to >> being called for non-present ptes so it makes sense to follow the >> behaviour. >> >> Fix this by using the new sz parameter which is now provided to the >> function. Additionally when clearing each pte in a contig range, don't >> gather the access and dirty bits if the pte is not present. >> >> An alternative approach that would not require API changes would be to >> store the PTE_CONT bit in a spare bit in the swap entry pte for the >> non-present case. But it felt cleaner to follow other APIs' lead and >> just pass in the size. >> >> As an aside, PTE_CONT is bit 52, which corresponds to bit 40 in the swap >> entry offset field (layout of non-present pte). Since hugetlb is never >> swapped to disk, this field will only be populated for markers, which >> always set this bit to 0 and hwpoison swap entries, which set the offset >> field to a PFN; So it would only ever be 1 for a 52-bit PVA system where >> memory in that high half was poisoned (I think!). So in practice, this >> bit would almost always be zero for non-present ptes and we would only >> clear the first entry if it was actually a contiguous block. That's >> probably a less severe symptom than if it was always interpretted as 1 >> and cleared out potentially-present neighboring PTEs. >> >> Cc: stable@xxxxxxxxxxxxxxx >> Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") >> Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx> >> --- >> arch/arm64/mm/hugetlbpage.c | 40 ++++++++++++++++--------------------- >> 1 file changed, 17 insertions(+), 23 deletions(-) >> >> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c >> index 06db4649af91..614b2feddba2 100644 >> --- a/arch/arm64/mm/hugetlbpage.c >> +++ b/arch/arm64/mm/hugetlbpage.c >> @@ -163,24 +163,23 @@ static pte_t get_clear_contig(struct mm_struct *mm, >> unsigned long pgsize, >> unsigned long ncontig) >> { >> - pte_t orig_pte = __ptep_get(ptep); >> - unsigned long i; >> - >> - for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) { >> - pte_t pte = __ptep_get_and_clear(mm, addr, ptep); >> - >> - /* >> - * If HW_AFDBM is enabled, then the HW could turn on >> - * the dirty or accessed bit for any page in the set, >> - * so check them all. >> - */ >> - if (pte_dirty(pte)) >> - orig_pte = pte_mkdirty(orig_pte); >> - >> - if (pte_young(pte)) >> - orig_pte = pte_mkyoung(orig_pte); >> + pte_t pte, tmp_pte; >> + bool present; >> + >> + pte = __ptep_get_and_clear(mm, addr, ptep); >> + present = pte_present(pte); >> + while (--ncontig) { >> + ptep++; >> + addr += pgsize; >> + tmp_pte = __ptep_get_and_clear(mm, addr, ptep); >> + if (present) { >> + if (pte_dirty(tmp_pte)) >> + pte = pte_mkdirty(pte); >> + if (pte_young(tmp_pte)) >> + pte = pte_mkyoung(pte); >> + } >> } > > nit: With the loop now structured like this, we really can't handle > num_contig_ptes() returning 0 if it gets an unknown size. Granted, that > really shouldn't happen, but perhaps it would be better to add a 'default' > case with a WARN() to num_contig_ptes() and then add an early return here? Looking at other users of num_contig_ptes() it looks like huge_ptep_get() already assumes at least 1 pte (it calls __ptep_get() before calling num_contig_ptes()) and set_huge_pte_at() assumes 1 pte for the "present and non-contig" case. So num_contig_ptes() returning 0 is already not really consumed consistently. How about we change the default num_contig_ptes() return value to 1 and add a warning if size is invalid: ---8<--- diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 614b2feddba2..b3a7fafe8892 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -100,20 +100,11 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr, static inline int num_contig_ptes(unsigned long size, size_t *pgsize) { - int contig_ptes = 0; + int contig_ptes = 1; *pgsize = size; switch (size) { -#ifndef __PAGETABLE_PMD_FOLDED - case PUD_SIZE: - if (pud_sect_supported()) - contig_ptes = 1; - break; -#endif - case PMD_SIZE: - contig_ptes = 1; - break; case CONT_PMD_SIZE: *pgsize = PMD_SIZE; contig_ptes = CONT_PMDS; @@ -122,6 +113,8 @@ static inline int num_contig_ptes(unsigned long size, size_t *pgsize) *pgsize = PAGE_SIZE; contig_ptes = CONT_PTES; break; + default: + WARN_ON(!__hugetlb_valid_size(size)); } return contig_ptes; ---8<--- > > Will