On Tue, Jan 14, 2025 at 10:52 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 13.01.25 04:38, Barry Song wrote: > > From: Barry Song <v-songbaohua@xxxxxxxx> > > > > This is a preparatory patch to support batch PTE unmapping in > > `try_to_unmap_one`. It first introduces range handling for > > `tlbbatch` flush. Currently, the range is always set to the size of > > PAGE_SIZE. > > You could have spelled out why you perform the VMA -> MM change. Sure, I’ll include some additional details in v3. > > > > > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > > Cc: Will Deacon <will@xxxxxxxxxx> > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > > Cc: Borislav Petkov <bp@xxxxxxxxx> > > Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > > Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> > > Cc: Anshuman Khandual <anshuman.khandual@xxxxxxx> > > Cc: Ryan Roberts <ryan.roberts@xxxxxxx> > > Cc: Shaoqin Huang <shahuang@xxxxxxxxxx> > > Cc: Gavin Shan <gshan@xxxxxxxxxx> > > Cc: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> > > Cc: Mark Rutland <mark.rutland@xxxxxxx> > > Cc: David Hildenbrand <david@xxxxxxxxxx> > > Cc: Lance Yang <ioworker0@xxxxxxxxx> > > Cc: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> > > Cc: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > > Cc: Paul Walmsley <paul.walmsley@xxxxxxxxxx> > > Cc: Palmer Dabbelt <palmer@xxxxxxxxxxx> > > Cc: Albert Ou <aou@xxxxxxxxxxxxxxxxx> > > Cc: Yicong Yang <yangyicong@xxxxxxxxxxxxx> > > Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> > > --- > > arch/arm64/include/asm/tlbflush.h | 26 ++++++++++++++------------ > > arch/arm64/mm/contpte.c | 2 +- > > arch/riscv/include/asm/tlbflush.h | 3 ++- > > arch/riscv/mm/tlbflush.c | 3 ++- > > arch/x86/include/asm/tlbflush.h | 3 ++- > > mm/rmap.c | 12 +++++++----- > > 6 files changed, 28 insertions(+), 21 deletions(-) > > > > diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h > > index bc94e036a26b..f34e4fab5aa2 100644 > > --- a/arch/arm64/include/asm/tlbflush.h > > +++ b/arch/arm64/include/asm/tlbflush.h > > @@ -322,13 +322,6 @@ static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) > > return true; > > } > > > > -static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, > > - struct mm_struct *mm, > > - unsigned long uaddr) > > -{ > > - __flush_tlb_page_nosync(mm, uaddr); > > -} > > - > > /* > > * If mprotect/munmap/etc occurs during TLB batched flushing, we need to > > * synchronise all the TLBI issued with a DSB to avoid the race mentioned in > > @@ -448,7 +441,7 @@ static inline bool __flush_tlb_range_limit_excess(unsigned long start, > > return false; > > } > > > > -static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma, > > +static inline void __flush_tlb_range_nosync(struct mm_struct *mm, > > unsigned long start, unsigned long end, > > unsigned long stride, bool last_level, > > int tlb_level) > > @@ -460,12 +453,12 @@ static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma, > > pages = (end - start) >> PAGE_SHIFT; > > > > if (__flush_tlb_range_limit_excess(start, end, pages, stride)) { > > - flush_tlb_mm(vma->vm_mm); > > + flush_tlb_mm(mm); > > return; > > } > > > > dsb(ishst); > > - asid = ASID(vma->vm_mm); > > + asid = ASID(mm); > > > > if (last_level) > > __flush_tlb_range_op(vale1is, start, pages, stride, asid, > > @@ -474,7 +467,7 @@ static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma, > > __flush_tlb_range_op(vae1is, start, pages, stride, asid, > > tlb_level, true, lpa2_is_enabled()); > > > > - mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end); > > + mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end); > > } > > > > static inline void __flush_tlb_range(struct vm_area_struct *vma, > > @@ -482,7 +475,7 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma, > > unsigned long stride, bool last_level, > > int tlb_level) > > { > > - __flush_tlb_range_nosync(vma, start, end, stride, > > + __flush_tlb_range_nosync(vma->vm_mm, start, end, stride, > > last_level, tlb_level); > > dsb(ish); > > } > > @@ -533,6 +526,15 @@ static inline void __flush_tlb_kernel_pgtable(unsigned long kaddr) > > dsb(ish); > > isb(); > > } > > + > > +static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, > > + struct mm_struct *mm, > > + unsigned long uaddr, > > + unsigned long size) > > +{ > > + __flush_tlb_range_nosync(mm, uaddr, uaddr + size, > > + PAGE_SIZE, true, 3); > > +} > > #endif > > > > #endif > > diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c > > index 55107d27d3f8..bcac4f55f9c1 100644 > > --- a/arch/arm64/mm/contpte.c > > +++ b/arch/arm64/mm/contpte.c > > @@ -335,7 +335,7 @@ int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, > > * eliding the trailing DSB applies here. > > */ > > addr = ALIGN_DOWN(addr, CONT_PTE_SIZE); > > - __flush_tlb_range_nosync(vma, addr, addr + CONT_PTE_SIZE, > > + __flush_tlb_range_nosync(vma->vm_mm, addr, addr + CONT_PTE_SIZE, > > PAGE_SIZE, true, 3); > > } > > > > diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h > > index 72e559934952..7f3ea687ce33 100644 > > --- a/arch/riscv/include/asm/tlbflush.h > > +++ b/arch/riscv/include/asm/tlbflush.h > > @@ -61,7 +61,8 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start, > > bool arch_tlbbatch_should_defer(struct mm_struct *mm); > > void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, > > struct mm_struct *mm, > > - unsigned long uaddr); > > + unsigned long uaddr, > > + unsigned long size); > > While we're at it, can we just convert this to the "two tabs" > indentation starting on second parameter line" way of doing things? Same > for all other cases. (at least in core code, if some arch wants their > own weird rules on how to handle these things) Sure. > > [...] > > > struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; > > int batch; > > @@ -681,7 +682,7 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, > > if (!pte_accessible(mm, pteval)) > > return; > > > > - arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr); > > + arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr, size); > > tlb_ubc->flush_required = true; > > > > /* > > @@ -757,7 +758,8 @@ void flush_tlb_batched_pending(struct mm_struct *mm) > > } > > #else > > static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, > > - unsigned long uaddr) > > + unsigned long uaddr, > > + unsigned long size) > > I'll note that mist tlb functions seem to consume start+end instead of > start+size, like > > flush_tlb_mm_range() > flush_tlb_kernel_range() > > So I'm wondering if this should be start+end instead of uaddr+size. For some reason, I can’t recall why I originally named it "uaddr" instead of "address" at: https://lore.kernel.org/lkml/20220707125242.425242-4-21cnbao@xxxxxxxxx/ The name seems a bit odd now. It's probably because the arm64 functions listed below are using "uaddr": static inline void __flush_tlb_page_nosync(struct mm_struct *mm, unsigned long uaddr); static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr); At that time, we only supported a single page for TLB deferred shootdown. Regarding set_tlb_ubc_flush_pending(), I’m fine with either approach: 1. start, size 2. start, end For option 1, it is eventually converted to (start, start + size) when calling __flush_tlb_range_nosync(). But if we convert to (start, end), it seems to align with static inline void flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned long end); So if there is no objection from Will, I will move to (start, end) in v3. > > -- > Cheers, > > David / dhildenb > Thanks Barry