On 18/08/2022 09:17, Heinrich Schuchardt wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > On 8/18/22 09:03, Daire.McNamara@xxxxxxxxxxxxx wrote: >> On Wed, 2022-08-17 at 18:04 +0000, Conor Dooley - M52691 wrote: >>> Hey Heinrich, >>> Interesting CC list you got there! Surprised the mailmap didn't sort >>> out Atish & Krzysztof's addresses, but I think I've fixed them up. >>> I see Daire isn't there either so +CC him too. >>> >>> On 17/08/2022 14:25, Heinrich Schuchardt wrote: >>>> EXTERNAL EMAIL: Do not click links or open attachments unless you >>>> know the content is safe >>>> >>>> The "PolarFire SoC MSS Technical Reference Manual" documents the >>>> following PLIC interrupts: >>>> >>>> 1 - L2 Cache Controller Signals when a metadata correction event >>>> occurs >>>> 2 - L2 Cache Controller Signals when an uncorrectable metadata >>>> event occurs >>>> 3 - L2 Cache Controller Signals when a data correction event occurs >>>> 4 - L2 Cache Controller Signals when an uncorrectable data event >>>> occurs >>>> >>>> This differs from the SiFive FU540 which only has three L2 cache >>>> related >>>> interrupts. >>>> >>>> The sequence in the device tree is defined by an enum: >> in drivers/soc/sifive/sifive_l2_cache.c >>>> >>>> enum { >>>> DIR_CORR = 0, >>>> DATA_CORR, >>>> DATA_UNCORR, >>>> DIR_UNCORR, >>>> }; >>> >>> Nit: more accurately by the dt-binding: >>> interrupts: >>> minItems: 3 >>> items: >>> - description: DirError interrupt >>> - description: DataError interrupt >>> - description: DataFail interrupt >>> - description: DirFail interrupt >>> >>> I do find the names in the enum to be a bit more understandable >>> however, >>> and ditto for the descriptions in our TRM... Maybe I should put that >>> on >>> my todo list of cleanups :) >>> >>> >>>> So the correct sequence of the L2 cache interrupts is >>>> >>>> interrupts = <1>, <3>, <4>, <2>; >>> >>> This looks correct to me. You mentioned on IRC that what you were >>> seeing >>> was a wall of >>> L2CACHE: DataFail @ 0x00000000.0807FFD8 >>> From a quick look at the driver, what seems to be happening here is >>> that >>> at some point (possibly before Linux even comes into the picture) >>> there >>> is an uncorrectable data error. Because the ordering in the dt is >>> wrong, >>> we read the wrong register and so the interrupt is never actually >>> cleared. With this patch applied, I see a single DataFail right as >>> the >>> interrupt gets registed & nothing after that. >>> >>> I am not really sure what value there is in enabling that driver >>> though, >>> mostly just seems like a debugging tool & from our pov we would see >>> the >>> HSS running in the monitor core as being responsible for handling the >>> l2-cache errors. >>> >>> @Daire, maybe you have an opinion here? >> Likewise. The new ordering of the interrupts to what the driver expects >> looks correct - as far as it goes. However, I'm not convinced enabling >> the SiFive l2 cache driver out of the box makes sense. Using l2 cache >> driver doesn't align terribly well with the current MPFS roadmap for >> mgt of ECC errors. >>> >>> Patch LGTM, so I'll likely apply it in the next day or two, would >>> just >>> like to see what Daire has to say first. >> If l2-cache controller is enabled, then interrupts should be connected >> as per TRM. I think this specific patch lgtm, ideally with a >> 'disabled' stanza and it's up to individual MPFS customers/boards to >> enable l2 cache controller if they want it. > > Disabling the device in the device-tree is orthogonal to fixing the > interrupt sequence. I would suggest that you use a separate patch for > adding status = "disabled";. Aye, not wrong there. At least from me, it was an observation on the way you discovered that the bug existed. I'll apply this patch today so - thanks for fixing it! Conor. > > Best regards > > Heinrich > >>> >>>> Fixes: e35b07a7df9b ("riscv: dts: microchip: mpfs: Group tuples in >>>> interrupt properties") >>> >>> BTW, it isn't really fixing this patch right? This is a dependency >>> for >>> backports to 5.15. >>> >>> Thanks for your patch, >>> Conor. >>> >>>> Fixes: 0fa6107eca41 ("RISC-V: Initial DTS for Microchip ICICLE >>>> board") >>>> Cc: Conor Dooley <conor.dooley@xxxxxxxxxxxxx> >>>> Cc: stable@xxxxxxxxxxxxxxx >>>> Signed-off-by: Heinrich Schuchardt < >>>> heinrich.schuchardt@xxxxxxxxxxxxx> >>>> --- >>>> arch/riscv/boot/dts/microchip/mpfs.dtsi | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/arch/riscv/boot/dts/microchip/mpfs.dtsi >>>> b/arch/riscv/boot/dts/microchip/mpfs.dtsi >>>> index 496d3b7642bd..ec1de6344be9 100644 >>>> --- a/arch/riscv/boot/dts/microchip/mpfs.dtsi >>>> +++ b/arch/riscv/boot/dts/microchip/mpfs.dtsi >>>> @@ -169,7 +169,7 @@ cctrllr: cache-controller@2010000 { >>>> cache-size = <2097152>; >>>> cache-unified; >>>> interrupt-parent = <&plic>; >>>> - interrupts = <1>, <2>, <3>; >>>> + interrupts = <1>, <3>, <4>, <2>; >>>> }; >>>> >>>> clint: clint@2000000 { >>>> -- >>>> 2.36.1 >>>> >