Re: [PATCH v10 3/6] riscv: mm: dma-noncoherent: nonstandard cache operations support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 31, 2023, at 02:49, Guo Ren wrote:
> On Mon, Jul 31, 2023 at 4:36 AM Arnd Bergmann <arnd@xxxxxxxx> wrote:
>>
>> On Sun, Jul 30, 2023, at 17:42, Emil Renner Berthing wrote:
>> > On Sun, 30 Jul 2023 at 17:11, Jisheng Zhang <jszhang@xxxxxxxxxx> wrote:
>>
>> >> > +
>> >> >  static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size)
>> >> >  {
>> >> >       void *vaddr = phys_to_virt(paddr);
>> >> >
>> >> > +#ifdef CONFIG_RISCV_NONSTANDARD_CACHE_OPS
>> >> > +     if (unlikely(noncoherent_cache_ops.wback)) {
>> >>
>> >> I'm worried about the performance impact here.
>> >> For unified kernel Image reason, RISCV_NONSTANDARD_CACHE_OPS will be
>> >> enabled by default, so standard CMO and T-HEAD's CMO platform's
>> >> performance will be impacted, because even an unlikely is put
>> >> here, the check action still needs to be done.
>> >
>> > On IRC I asked why not use a static key so the overhead is just a
>> > single nop when the standard CMO ops are available, but the consensus
>> > seemed to be that the flushing would completely dominate this branch.
>> > And on platforms with the standard CMO ops the branch be correctly
>> > predicted anyway.
>>
>> Not just the flushing, but also loading back the invalidated
>> cache lines afterwards is just very expensive. I don't think
>> you would be able to measure a difference between the static
>> key and a correctly predicted branch on any relevant usecase here.
> Maybe we should move CMO & THEAD ops to the noncoherent_cache_ops, and
> only keep one of them.
>
> I prefer noncoherent_cache_ops, it's more maintance than ALTERNATIVE.

I think moving the THEAD ops at the same level as all nonstandard
operations makes sense, but I'd still leave CMO as an explicit
fast path that avoids the indirect branch. This seems like the right
thing to do both for readability and for platforms on which the
indirect branch has a noticeable overhead.

    Arnd




[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux