On 2021/1/29 21:54, Leizhen (ThunderTown) wrote: > > > On 2021/1/29 18:26, Arnd Bergmann wrote: >> On Fri, Jan 29, 2021 at 9:16 AM Arnd Bergmann <arnd@xxxxxxxxxx> wrote: >>> On Fri, Jan 29, 2021 at 8:23 AM Leizhen (ThunderTown) >>> <thunder.leizhen@xxxxxxxxxx> wrote: >>>> On 2021/1/28 22:24, Arnd Bergmann wrote: >>>>> On Sat, Jan 16, 2021 at 4:27 AM Zhen Lei <thunder.leizhen@xxxxxxxxxx> wrote: >>>>>> diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile >>>>>> + >>>>>> +static void l3cache_maint_common(u32 range, u32 op_type) >>>>>> +{ >>>>>> + u32 reg; >>>>>> + >>>>>> + reg = readl(l3_ctrl_base + L3_MAINT_CTRL); >>>>>> + reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK); >>>>>> + reg |= range | op_type; >>>>>> + reg |= L3_MAINT_STATUS_START; >>>>>> + writel(reg, l3_ctrl_base + L3_MAINT_CTRL); >>>>> >>>>> Are there contents of L3_MAINT_CTRL that need to be preserved >>>>> across calls and can not be inferred? A 'readl()' is often expensive, >>>>> so it might be more efficient if you can avoid that. >>>> >>>> Right, this readl() can be replaced with readl_relaxed(). Thanks. >>>> >>>> I'll check and correct the readl() and writel() in other places. >>> >>> What I meant is that if you want to replace them, you should provide >>> performance numbers that show how much difference this makes >>> and add comments in the source code explaining how you proved that >>> the _relaxed() version is actually correct. >> >> Another clarification, as there are actually two independent >> points here: >> >> * if you can completely remove the readl() above and just write a >> hardcoded value into the register, or perhaps read the original >> value once at boot time, that is probably a win because it >> avoids one of the barriers in the beginning. The datasheet should >> tell you if there are any bits in the register that have to be >> preserved > > Code coupling will become very strong. > >> >> * Regarding the _relaxed() accessors, it's a lot harder to know >> whether that is safe, as you first have to show, in particular in case >> any of the accesses stop being guarded by the spinlock in that >> case, and whether there may be a case where you have to >> serialize the memory access against accesses that are still in the >> store queue or prefetched. >> >> Whether this matters at all depends mostly on the type of devices >> you are driving on your SoC. If you have any high-speed network >> interfaces that are unable to do cache coherent DMA, any extra >> instruction here may impact the number of packets you can transfer, >> but if all your high-speed devices are connected to a coherent >> interconnect, I would just go with the obvious approach and use >> the safe MMIO accessors everywhere. > > In fact, this driver has been running on an earlier version for several years > and has not received any feedback about the performance issue. So I didn't > try to optimize it when I first sent these patches. I had to reconsider it > until you noticed it. > > How about keeping it unchanged for the moment? It'll take a lot of time and > energy to retest. In the spirit of code excellence, it's still necessary to optimize it. Yesterday, my family urged me to go back, I wrote it in a hurry. > >> >> Arnd >> >> _______________________________________________ >> linux-arm-kernel mailing list >> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >> >> > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > . >