On Mon, Jul 31, 2023 at 07:39:30AM +0200, Arnd Bergmann wrote: > On Mon, Jul 31, 2023, at 02:49, Guo Ren wrote: > > On Mon, Jul 31, 2023 at 4:36 AM Arnd Bergmann <arnd@xxxxxxxx> wrote: > >> > >> On Sun, Jul 30, 2023, at 17:42, Emil Renner Berthing wrote: > >> > On Sun, 30 Jul 2023 at 17:11, Jisheng Zhang <jszhang@xxxxxxxxxx> wrote: > >> > >> >> > + > >> >> > static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > >> >> > { > >> >> > void *vaddr = phys_to_virt(paddr); > >> >> > > >> >> > +#ifdef CONFIG_RISCV_NONSTANDARD_CACHE_OPS > >> >> > + if (unlikely(noncoherent_cache_ops.wback)) { > >> >> > >> >> I'm worried about the performance impact here. > >> >> For unified kernel Image reason, RISCV_NONSTANDARD_CACHE_OPS will be > >> >> enabled by default, so standard CMO and T-HEAD's CMO platform's > >> >> performance will be impacted, because even an unlikely is put > >> >> here, the check action still needs to be done. > >> > > >> > On IRC I asked why not use a static key so the overhead is just a > >> > single nop when the standard CMO ops are available, but the consensus > >> > seemed to be that the flushing would completely dominate this branch. > >> > And on platforms with the standard CMO ops the branch be correctly > >> > predicted anyway. > >> > >> Not just the flushing, but also loading back the invalidated > >> cache lines afterwards is just very expensive. I don't think > >> you would be able to measure a difference between the static I read this as: the cache clean/inv is so expensive that the static key saving percentage is trivial, is this understanding right? this could be measured by writing a small benchmark kernel module which just calls cache clean/inv a buf(for example 1500Bytes)in a loop. > >> key and a correctly predicted branch on any relevant usecase here. > > Maybe we should move CMO & THEAD ops to the noncoherent_cache_ops, and > > only keep one of them. > > > > I prefer noncoherent_cache_ops, it's more maintance than ALTERNATIVE. > > I think moving the THEAD ops at the same level as all nonstandard > operations makes sense, but I'd still leave CMO as an explicit > fast path that avoids the indirect branch. This seems like the right > thing to do both for readability and for platforms on which the > indirect branch has a noticeable overhead. > > Arnd