Re: [PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller

"Leizhen (ThunderTown)" <thunder.leizhen@xxxxxxxxxx> · Sat, 30 Jan 2021 11:00:02 +0800

On 2021/1/29 21:54, Leizhen (ThunderTown) wrote:
> 
> 
> On 2021/1/29 18:26, Arnd Bergmann wrote:
>> On Fri, Jan 29, 2021 at 9:16 AM Arnd Bergmann <arnd@xxxxxxxxxx> wrote:
>>> On Fri, Jan 29, 2021 at 8:23 AM Leizhen (ThunderTown)
>>> <thunder.leizhen@xxxxxxxxxx> wrote:
>>>> On 2021/1/28 22:24, Arnd Bergmann wrote:
>>>>> On Sat, Jan 16, 2021 at 4:27 AM Zhen Lei <thunder.leizhen@xxxxxxxxxx> wrote:
>>>>>> diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
>>>>>> +
>>>>>> +static void l3cache_maint_common(u32 range, u32 op_type)
>>>>>> +{
>>>>>> +       u32 reg;
>>>>>> +
>>>>>> +       reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
>>>>>> +       reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK);
>>>>>> +       reg |= range | op_type;
>>>>>> +       reg |= L3_MAINT_STATUS_START;
>>>>>> +       writel(reg, l3_ctrl_base + L3_MAINT_CTRL);
>>>>>
>>>>> Are there contents of L3_MAINT_CTRL that need to be preserved
>>>>> across calls and can not be inferred? A 'readl()' is often expensive,
>>>>> so it might be more efficient if you can avoid that.
>>>>
>>>> Right, this readl() can be replaced with readl_relaxed(). Thanks.
>>>>
>>>> I'll check and correct the readl() and writel() in other places.
>>>
>>> What I meant is that if you want to replace them, you should provide
>>> performance numbers that show how much difference this makes
>>> and add comments in the source code explaining how you proved that
>>> the _relaxed() version is actually correct.
>>
>> Another clarification, as there are actually two independent
>> points here:
>>
>> * if you can completely remove the readl() above and just write a
>>   hardcoded value into the register, or perhaps read the original
>>   value once at boot time, that is probably a win because it
>>   avoids one of the barriers in the beginning. The datasheet should
>>   tell you if there are any bits in the register that have to be
>>   preserved
> 
> Code coupling will become very strong.
> 
>>
>> * Regarding the _relaxed() accessors, it's a lot harder to know
>>   whether that is safe, as you first have to show, in particular in case
>>   any of the accesses stop being guarded by the spinlock in that
>>   case, and whether there may be a case where you have to
>>   serialize the memory access against accesses that are still in the
>>   store queue or prefetched.
>>
>> Whether this matters at all depends mostly on the type of devices
>> you are driving on your SoC. If you have any high-speed network
>> interfaces that are unable to do cache coherent DMA, any extra
>> instruction here may impact the number of packets you can transfer,
>> but if all your high-speed devices are connected to a coherent
>> interconnect, I would just go with the obvious approach and use
>> the safe MMIO accessors everywhere.
> 
> In fact, this driver has been running on an earlier version for several years
> and has not received any feedback about the performance issue. So I didn't
> try to optimize it when I first sent these patches. I had to reconsider it
> until you noticed it.
> 
> How about keeping it unchanged for the moment? It'll take a lot of time and
> energy to retest.

In the spirit of code excellence, it's still necessary to optimize it.
Yesterday, my family urged me to go back, I wrote it in a hurry.

> 
>>
>>        Arnd
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>>
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> .
>