On Thu, Dec 28, 2023 at 04:46:41PM +0800, Jisheng Zhang wrote: > diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h > index 846c563689a8..6164c5f3b78f 100644 > --- a/arch/arm64/include/asm/tlb.h > +++ b/arch/arm64/include/asm/tlb.h > @@ -62,7 +62,10 @@ static inline void tlb_flush(struct mmu_gather *tlb) > * invalidating the walk-cache, since the ASID allocator won't > * reallocate our ASID without invalidating the entire TLB. > */ > - if (tlb->fullmm) { > + if (tlb->fullmm) > + return; > + > + if (tlb->need_flush_all) { > if (!last_level) > flush_tlb_mm(tlb->mm); > return; I don't think that's correct. IIRC, commit f270ab88fdf2 ("arm64: tlb: Adjust stride and type of TLBI according to mmu_gather") explicitly added the !last_level check to invalidate the walk cache (correspondence between the VA and the page table page rather than the full VA->PA translation). > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h > index 129a3a759976..f2d46357bcbb 100644 > --- a/include/asm-generic/tlb.h > +++ b/include/asm-generic/tlb.h > @@ -452,7 +452,7 @@ static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb) > * these bits. > */ > if (!(tlb->freed_tables || tlb->cleared_ptes || tlb->cleared_pmds || > - tlb->cleared_puds || tlb->cleared_p4ds)) > + tlb->cleared_puds || tlb->cleared_p4ds || tlb->need_flush_all)) > return; > > tlb_flush(tlb); > diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c > index 4f559f4ddd21..79298bac3481 100644 > --- a/mm/mmu_gather.c > +++ b/mm/mmu_gather.c > @@ -384,7 +384,7 @@ void tlb_finish_mmu(struct mmu_gather *tlb) > * On x86 non-fullmm doesn't yield significant difference > * against fullmm. > */ > - tlb->fullmm = 1; > + tlb->need_flush_all = 1; > __tlb_reset_range(tlb); > tlb->freed_tables = 1; > } The optimisation here was added about a year later in commit 7a30df49f63a ("mm: mmu_gather: remove __tlb_reset_range() for force flush"). Do we still need to keep freed_tables = 1 here? I'd say only __tlb_reset_range(). -- Catalin