On Sat, 2019-06-08 at 11:05 +0200, Borislav Petkov wrote: > On Sat, Jun 08, 2019 at 10:16:11AM +1000, Benjamin Herrenschmidt wrote: > > Those IP blocks don't need any SW coordination at runtime. The drivers > > don't share data nor communicate with each other. There is absolultely > > no reason to go down that path. > > Let me set one thing straight: the EDAC "subsystem" if you will - or > that pile of code which does error counting and reporting - has its > limitations in supporting one EDAC driver per platform. And whenever we > have two drivers loadable on a platform, we have to do dirty hacks like > > 301375e76432 ("EDAC: Add owner check to the x86 platform drivers") > > What that means is, that if you need to call EDAC logging routines or > whatnot from two different drivers, there's no locking, no nothing. So > it might work or it might set your cat on fire. Should we fix that then instead ? What are the big issues with adding some basic locking ? being called from NMIs ? If the separate drivers operate on distinct counters I don't see a big problem there. > IOW, having multiple separate "drivers" or representations of RAS > functionality using EDAC facilities is something that hasn't been > done. Well, almost. highbank_mc_edac.c and highbank_l2_edac.c is one > example but they make sure they don't step on each other's toes by using > different EDAC pieces - a device vs a memory controller abstraction. That sounds like a reasonable requirement. > And now the moment all of a sudden you decide you want for those > separate "drivers" to synchronize on something, you need to do something > hacky like the amd_register_ecc_decoder() thing, for example, because we > need to call into the EDAC memory controller driver to decode a DRAM ECC > error properly, while the rest of the error types get decoded somewhere > else... > > Then there comes the issue with code reuse - wouldn't it be great if a > memory controller driver can be shared between platform drivers instead of > copying it in both? > > We already do that - see fsl_ddr_edac.c which gets shared between PPC > *and* ARM. drivers/edac/skx_common.c is another example for Intel chips. > > Now, if you have a platform with 10 IP blocks which each have RAS > functionality, are you saying you'll do 10 different pieces called > > <platform_name>_<ip_block#>_edac.c > > ? > > And if <next_platform> has an old IP block with the old RAS > functionality, you load <platform_name>_<ip_block>_edac.c on the new > platform too? I'n not sure why <platform_name> ... Anyway, let's get back to the specific case of our Amazon platform here since it's a concrete example. Hanna, can you give us a reasonably exhaustive list of how many such "drivers" we'll want in the EDAC subsystem and whether you envision any coordination requirement between them or not ? Cheers, Ben.