Hi Alex, On 2024-01-04 6:33 AM, Alexandre Ghiti wrote: > On Tue, Jan 2, 2024 at 11:01 PM Samuel Holland > <samuel.holland@xxxxxxxxxx> wrote: >> >> If no other CPU is online, a local cache or TLB flush is sufficient. >> These checks can be constant-folded when SMP is disabled. >> >> Signed-off-by: Samuel Holland <samuel.holland@xxxxxxxxxx> >> --- >> >> Changes in v4: >> - New patch for v4 >> >> arch/riscv/mm/cacheflush.c | 4 +++- >> arch/riscv/mm/tlbflush.c | 4 +++- >> 2 files changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c >> index 47c485bc7df0..f7933ae88a55 100644 >> --- a/arch/riscv/mm/cacheflush.c >> +++ b/arch/riscv/mm/cacheflush.c >> @@ -21,7 +21,9 @@ void flush_icache_all(void) >> { >> local_flush_icache_all(); >> >> - if (riscv_use_sbi_for_rfence()) >> + if (num_online_cpus() < 2) >> + return; >> + else if (riscv_use_sbi_for_rfence()) >> sbi_remote_fence_i(NULL); >> else >> on_each_cpu(ipi_remote_fence_i, NULL, 1); >> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c >> index 2f18fe6fc4f3..37b3c93e3c30 100644 >> --- a/arch/riscv/mm/tlbflush.c >> +++ b/arch/riscv/mm/tlbflush.c >> @@ -73,7 +73,9 @@ static void __ipi_flush_tlb_all(void *info) >> >> void flush_tlb_all(void) >> { >> - if (riscv_use_sbi_for_rfence()) >> + if (num_online_cpus() < 2) >> + local_flush_tlb_all(); >> + else if (riscv_use_sbi_for_rfence()) >> sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID); >> else >> on_each_cpu(__ipi_flush_tlb_all, NULL, 1); >> -- >> 2.42.0 >> > > on_each_cpu() already deals correctly with a single online cpu, the > only thing to optimize here is the SBI rfence. So I'd move this new > test in sbi_remote_sfence_vma_asid() and sbi_remote_fence_i() to avoid > the superfluous M-mode entry when only one cpu is online by checking > the cpumask. And since sbi_remote_fence_i() is used in another What specific cpumask check are you suggesting? In sbi_remote_sfence_vma_asid() I don't think we can assume the local cpu is always included in the cpumask (which we _can_ assume here), so it would need to construct/compare the whole bitmap. That's much more expensive than the atomic load here. > function (flush_icache_mm()), we could also take advantage of this > optimization when only the local cpu must be flushed. flush_icache_mm() already has a "local" variable which it uses to skip the call to sbi_remote_fence_i(). Same with the broadcast check in __flush_tlb_range(). So no additional check is needed there. Those two functions, plus the two changed in this patch, are the only call sites of the sbi_*() functions. I think it makes more sense to optimize the four call sites than adding an additional check in the sbi_*() functions. Regards, Samuel