H Samuel, On Tue, Mar 12, 2024 at 8:36 AM Samuel Holland <samuel.holland@xxxxxxxxxx> wrote: > > Hi Yunhui, > > On 2024-02-29 8:48 PM, yunhui cui wrote: > > Hi Samuel, > > > > On Fri, Mar 1, 2024 at 7:22 AM Samuel Holland <samuel.holland@xxxxxxxxxx> wrote: > >> > >> Since implementations affected by SiFive errata CIP-1200 always use the > >> global variant of the sfence.vma instruction, they only need to execute > >> the instruction once. The range-based loop only hurts performance. > >> > >> Signed-off-by: Samuel Holland <samuel.holland@xxxxxxxxxx> > >> --- > >> > >> (no changes since v4) > >> > >> Changes in v4: > >> - Only set tlb_flush_all_threshold when CONFIG_MMU=y. > >> > >> Changes in v3: > >> - New patch for v3 > >> > >> arch/riscv/errata/sifive/errata.c | 5 +++++ > >> arch/riscv/include/asm/tlbflush.h | 2 ++ > >> arch/riscv/mm/tlbflush.c | 2 +- > >> 3 files changed, 8 insertions(+), 1 deletion(-) > >> > >> diff --git a/arch/riscv/errata/sifive/errata.c b/arch/riscv/errata/sifive/errata.c > >> index 3d9a32d791f7..716cfedad3a2 100644 > >> --- a/arch/riscv/errata/sifive/errata.c > >> +++ b/arch/riscv/errata/sifive/errata.c > >> @@ -42,6 +42,11 @@ static bool errata_cip_1200_check_func(unsigned long arch_id, unsigned long imp > >> return false; > >> if ((impid & 0xffffff) > 0x200630 || impid == 0x1200626) > >> return false; > >> + > >> +#ifdef CONFIG_MMU > >> + tlb_flush_all_threshold = 0; > >> +#endif > >> + > >> return true; > >> } > >> > >> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h > >> index 463b615d7728..8e329721375b 100644 > >> --- a/arch/riscv/include/asm/tlbflush.h > >> +++ b/arch/riscv/include/asm/tlbflush.h > >> @@ -66,6 +66,8 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, > >> unsigned long uaddr); > >> void arch_flush_tlb_batched_pending(struct mm_struct *mm); > >> void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); > >> + > >> +extern unsigned long tlb_flush_all_threshold; > >> #else /* CONFIG_MMU */ > >> #define local_flush_tlb_all() do { } while (0) > >> #endif /* CONFIG_MMU */ > >> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c > >> index 365e0a0e4725..22870f213188 100644 > >> --- a/arch/riscv/mm/tlbflush.c > >> +++ b/arch/riscv/mm/tlbflush.c > >> @@ -11,7 +11,7 @@ > >> * Flush entire TLB if number of entries to be flushed is greater > >> * than the threshold below. > >> */ > >> -static unsigned long tlb_flush_all_threshold __read_mostly = 64; > >> +unsigned long tlb_flush_all_threshold __read_mostly = 64; > >> > >> static void local_flush_tlb_range_threshold_asid(unsigned long start, > >> unsigned long size, > >> -- > >> 2.43.1 > >> > > > > If local_flush_tlb_all_asid() is used every time, more PTWs will be > > generated. Will such modifications definitely improve the overall > > performance? > > This change in this commit specifically applies to older SiFive SoCs with a bug > making single-page sfence.vma instructions unsafe to use. In this case, a single > call to local_flush_tlb_all_asid() is optimal, yes. Would it be more clear to add this content to the git commit description appropriately? > > > Hi Alex, Samuel, > > The relationship between flush_xx_range_asid() and nr_ptes is > > basically linear growth (y=kx +b), while flush_all_asid() has nothing > > to do with nr_ptes (y=c). > > Some TLBs may do some optimization. The operation of flush all itself > > requires very few cycles, but there is a certain delay between > > consecutive flush all. > > The intersection of the two straight lines is the optimal solution of > > tlb_flush_all_threshold. In actual situations, continuous > > flush_all_asid will not occur. One problem caused by flush_all_asid() > > is that multiple flush entries require PTW, which causes greater > > latency. > > Therefore, the value of tlb_flush_all_threshold needs to be considered > > or quantified. Maybe doing local_flush_tlb_page_asid() based on the > > actual nr_ptes_in_range would give better overall performance. > > What do you think? > > Yes, this was something Alex brought up when adding this threshold, that it > should be tuned for various scenarios. That still needs to be done. This patch > just covers one specific case where we know the optimal answer due to an erratum. > > Regards, > Samuel > Thanks, Yunhui