On 2/5/25 20:39, Ryan Roberts wrote: > ___set_ptes() previously called __set_pte() for each PTE in the range, > which would conditionally issue a DSB and ISB to make the new PTE value > immediately visible to the table walker if the new PTE was valid and for > kernel space. > > We can do better than this; let's hoist those barriers out of the loop > so that they are only issued once at the end of the loop. We then reduce > the cost by the number of PTEs in the range. > > Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx> > --- > arch/arm64/include/asm/pgtable.h | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index 3b55d9a15f05..1d428e9c0e5a 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -317,10 +317,8 @@ static inline void __set_pte_nosync(pte_t *ptep, pte_t pte) > WRITE_ONCE(*ptep, pte); > } > > -static inline void __set_pte(pte_t *ptep, pte_t pte) > +static inline void __set_pte_complete(pte_t pte) > { > - __set_pte_nosync(ptep, pte); > - > /* > * Only if the new pte is valid and kernel, otherwise TLB maintenance > * or update_mmu_cache() have the necessary barriers. > @@ -331,6 +329,12 @@ static inline void __set_pte(pte_t *ptep, pte_t pte) > } > } > > +static inline void __set_pte(pte_t *ptep, pte_t pte) > +{ > + __set_pte_nosync(ptep, pte); > + __set_pte_complete(pte); > +} > + > static inline pte_t __ptep_get(pte_t *ptep) > { > return READ_ONCE(*ptep); > @@ -647,12 +651,14 @@ static inline void ___set_ptes(struct mm_struct *mm, pte_t *ptep, pte_t pte, > > for (;;) { > __check_safe_pte_update(mm, ptep, pte); > - __set_pte(ptep, pte); > + __set_pte_nosync(ptep, pte); > if (--nr == 0) > break; > ptep++; > pte = pte_advance_pfn(pte, stride); > } > + > + __set_pte_complete(pte); Given that the loop now iterates over number of page table entries without corresponding consecutive dsb/isb sync, could there be a situation where something else gets scheduled on the cpu before __set_pte_complete() is called ? Hence leaving the entire page table entries block without desired mapping effect. IOW how __set_pte_complete() is ensured to execute once the loop above completes. Otherwise this change LGTM. > } > > static inline void __set_ptes(struct mm_struct *mm,