Hi Samuel, On Fri, Mar 1, 2024 at 7:22 AM Samuel Holland <samuel.holland@xxxxxxxxxx> wrote: > > Since implementations affected by SiFive errata CIP-1200 always use the > global variant of the sfence.vma instruction, they only need to execute > the instruction once. The range-based loop only hurts performance. > > Signed-off-by: Samuel Holland <samuel.holland@xxxxxxxxxx> > --- > > (no changes since v4) > > Changes in v4: > - Only set tlb_flush_all_threshold when CONFIG_MMU=y. > > Changes in v3: > - New patch for v3 > > arch/riscv/errata/sifive/errata.c | 5 +++++ > arch/riscv/include/asm/tlbflush.h | 2 ++ > arch/riscv/mm/tlbflush.c | 2 +- > 3 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/arch/riscv/errata/sifive/errata.c b/arch/riscv/errata/sifive/errata.c > index 3d9a32d791f7..716cfedad3a2 100644 > --- a/arch/riscv/errata/sifive/errata.c > +++ b/arch/riscv/errata/sifive/errata.c > @@ -42,6 +42,11 @@ static bool errata_cip_1200_check_func(unsigned long arch_id, unsigned long imp > return false; > if ((impid & 0xffffff) > 0x200630 || impid == 0x1200626) > return false; > + > +#ifdef CONFIG_MMU > + tlb_flush_all_threshold = 0; > +#endif > + > return true; > } > > diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h > index 463b615d7728..8e329721375b 100644 > --- a/arch/riscv/include/asm/tlbflush.h > +++ b/arch/riscv/include/asm/tlbflush.h > @@ -66,6 +66,8 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, > unsigned long uaddr); > void arch_flush_tlb_batched_pending(struct mm_struct *mm); > void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); > + > +extern unsigned long tlb_flush_all_threshold; > #else /* CONFIG_MMU */ > #define local_flush_tlb_all() do { } while (0) > #endif /* CONFIG_MMU */ > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c > index 365e0a0e4725..22870f213188 100644 > --- a/arch/riscv/mm/tlbflush.c > +++ b/arch/riscv/mm/tlbflush.c > @@ -11,7 +11,7 @@ > * Flush entire TLB if number of entries to be flushed is greater > * than the threshold below. > */ > -static unsigned long tlb_flush_all_threshold __read_mostly = 64; > +unsigned long tlb_flush_all_threshold __read_mostly = 64; > > static void local_flush_tlb_range_threshold_asid(unsigned long start, > unsigned long size, > -- > 2.43.1 > If local_flush_tlb_all_asid() is used every time, more PTWs will be generated. Will such modifications definitely improve the overall performance? Hi Alex, Samuel, The relationship between flush_xx_range_asid() and nr_ptes is basically linear growth (y=kx +b), while flush_all_asid() has nothing to do with nr_ptes (y=c). Some TLBs may do some optimization. The operation of flush all itself requires very few cycles, but there is a certain delay between consecutive flush all. The intersection of the two straight lines is the optimal solution of tlb_flush_all_threshold. In actual situations, continuous flush_all_asid will not occur. One problem caused by flush_all_asid() is that multiple flush entries require PTW, which causes greater latency. Therefore, the value of tlb_flush_all_threshold needs to be considered or quantified. Maybe doing local_flush_tlb_page_asid() based on the actual nr_ptes_in_range would give better overall performance. What do you think? Thanks, Yunhui