Re: [PATCH 1/1] riscv: dts: microchip: correct L2 cache interrupts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 18/08/2022 09:17, Heinrich Schuchardt wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On 8/18/22 09:03, Daire.McNamara@xxxxxxxxxxxxx wrote:
>> On Wed, 2022-08-17 at 18:04 +0000, Conor Dooley - M52691 wrote:
>>> Hey Heinrich,
>>> Interesting CC list you got there! Surprised the mailmap didn't sort
>>> out Atish & Krzysztof's addresses, but I think I've fixed them up.
>>>   I see Daire isn't there either so +CC him too.
>>>
>>> On 17/08/2022 14:25, Heinrich Schuchardt wrote:
>>>> EXTERNAL EMAIL: Do not click links or open attachments unless you
>>>> know the content is safe
>>>>
>>>> The "PolarFire SoC MSS Technical Reference Manual" documents the
>>>> following PLIC interrupts:
>>>>
>>>> 1 - L2 Cache Controller Signals when a metadata correction event
>>>> occurs
>>>> 2 - L2 Cache Controller Signals when an uncorrectable metadata
>>>> event occurs
>>>> 3 - L2 Cache Controller Signals when a data correction event occurs
>>>> 4 - L2 Cache Controller Signals when an uncorrectable data event
>>>> occurs
>>>>
>>>> This differs from the SiFive FU540 which only has three L2 cache
>>>> related
>>>> interrupts.
>>>>
>>>> The sequence in the device tree is defined by an enum:
>> in drivers/soc/sifive/sifive_l2_cache.c
>>>>
>>>>      enum {
>>>>              DIR_CORR = 0,
>>>>              DATA_CORR,
>>>>              DATA_UNCORR,
>>>>              DIR_UNCORR,
>>>>      };
>>>
>>> Nit: more accurately by the dt-binding:
>>>    interrupts:
>>>      minItems: 3
>>>      items:
>>>        - description: DirError interrupt
>>>        - description: DataError interrupt
>>>        - description: DataFail interrupt
>>>        - description: DirFail interrupt
>>>
>>> I do find the names in the enum to be a bit more understandable
>>> however,
>>> and ditto for the descriptions in our TRM... Maybe I should put that
>>> on
>>> my todo list of cleanups :)
>>>
>>>
>>>> So the correct sequence of the L2 cache interrupts is
>>>>
>>>>      interrupts = <1>, <3>, <4>, <2>;
>>>
>>> This looks correct to me. You mentioned on IRC that what you were
>>> seeing
>>> was a wall of
>>> L2CACHE: DataFail @ 0x00000000.0807FFD8
>>>  From a quick look at the driver, what seems to be happening here is
>>> that
>>> at some point (possibly before Linux even comes into the picture)
>>> there
>>> is an uncorrectable data error. Because the ordering in the dt is
>>> wrong,
>>> we read the wrong register and so the interrupt is never actually
>>> cleared. With this patch applied, I see a single DataFail right as
>>> the
>>> interrupt gets registed & nothing after that.
>>>
>>> I am not really sure what value there is in enabling that driver
>>> though,
>>> mostly just seems like a debugging tool & from our pov we would see
>>> the
>>> HSS running in the monitor core as being responsible for handling the
>>> l2-cache errors.
>>>
>>> @Daire, maybe you have an opinion here?
>> Likewise. The new ordering of the interrupts to what the driver expects
>> looks correct - as far as it goes. However, I'm not convinced enabling
>> the SiFive l2 cache driver out of the box makes sense. Using l2 cache
>> driver doesn't align terribly well with the current MPFS roadmap for
>> mgt of ECC errors.
>>>
>>> Patch LGTM, so I'll likely apply it in the next day or two, would
>>> just
>>> like to see what Daire has to say first.
>> If l2-cache controller is enabled, then interrupts should be connected
>> as per TRM.  I think this specific patch lgtm, ideally with a
>> 'disabled' stanza and it's up to individual MPFS customers/boards to
>> enable l2 cache controller if they want it.
> 
> Disabling the device in the device-tree is orthogonal to fixing the
> interrupt sequence. I would suggest that you use a separate patch for
> adding status = "disabled";.

Aye, not wrong there. At least from me, it was an observation on the
way you discovered that the bug existed. I'll apply this patch today
so - thanks for fixing it!
Conor.

> 
> Best regards
> 
> Heinrich
> 
>>>
>>>> Fixes: e35b07a7df9b ("riscv: dts: microchip: mpfs: Group tuples in
>>>> interrupt properties")
>>>
>>> BTW, it isn't really fixing this patch right? This is a dependency
>>> for
>>> backports to 5.15.
>>>
>>> Thanks for your patch,
>>> Conor.
>>>
>>>> Fixes: 0fa6107eca41 ("RISC-V: Initial DTS for Microchip ICICLE
>>>> board")
>>>> Cc: Conor Dooley <conor.dooley@xxxxxxxxxxxxx>
>>>> Cc: stable@xxxxxxxxxxxxxxx
>>>> Signed-off-by: Heinrich Schuchardt <
>>>> heinrich.schuchardt@xxxxxxxxxxxxx>
>>>> ---
>>>>   arch/riscv/boot/dts/microchip/mpfs.dtsi | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/riscv/boot/dts/microchip/mpfs.dtsi
>>>> b/arch/riscv/boot/dts/microchip/mpfs.dtsi
>>>> index 496d3b7642bd..ec1de6344be9 100644
>>>> --- a/arch/riscv/boot/dts/microchip/mpfs.dtsi
>>>> +++ b/arch/riscv/boot/dts/microchip/mpfs.dtsi
>>>> @@ -169,7 +169,7 @@ cctrllr: cache-controller@2010000 {
>>>>                          cache-size = <2097152>;
>>>>                          cache-unified;
>>>>                          interrupt-parent = <&plic>;
>>>> -                       interrupts = <1>, <2>, <3>;
>>>> +                       interrupts = <1>, <3>, <4>, <2>;
>>>>                  };
>>>>
>>>>                  clint: clint@2000000 {
>>>> -- 
>>>> 2.36.1
>>>>
> 





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux