Not probing the edac driver turned out to be a device tree issue as Steve suspected. Thanks to both Steve and York, this has been resolved and the backport is now logging ECC errors after injection. Added the ddr qoriq-memory-controller entry since we used a different .dtsi file. arch/arm64/boot/dts/freescale/...ls1043a.dtsi ddr: memory-controller@1080000 { compatible = "fsl,qoriq-memory-controller"; reg = <0x0 0x1080000 0x0 0x1000>; interrupts = <0 144 0x4>; big-endian; }; I now need to collect and report CE and UE ECC errors and extend the existing logging and reporting function that I currently see. After reviewing the following document, the system logging appears different from that given in the kernel EDAC document. I need the level of granularity described in the edac.txt file. https://www.mjmwired.net/kernel/Documentation/edac.txt#173 same as kernel/Documentation/edac.txt 1) Can I gather the system logging described below in the edac.txt file for layerscape? 2) Is there anything similar to the edac-utils but for ARM, or does sysfs replace the edac-utils, or something else? 3) What is currently used for collecting and reporting ECC errors for ARM/EDAC beyond the kernel log and messages? https://github.com/grondo/edac-utils 4) How is RAS reporting integrated into EDAC for error collection and reporting? 5) Has there been a patch to prevent EDAC sysfs API from reporting bogus values? See http://lkml.iu.edu/hypermail/linux/kernel/1205.3/02249.html - The EDAC sysfs API will still report bogus values. So, userspace tools like edac-utils will still use the bogus data; - Add a new tracepoint-based way to get the binary information about the errors. This is the logging I currently see with layerscape EDAC. Need something explaining these fields. [ 407.612311] EDAC FSL_DDR MC0: Err Detect Register: 0x80000004 [ 407.618182] EDAC FSL_DDR MC0: Faulty Data bit: 0 [ 407.622793] EDAC FSL_DDR MC0: Expected Data / ECC: 0x40c50901_40c50900 / 0x800000f0 [ 407.630443] EDAC FSL_DDR MC0: Captured Data / ECC: 0x40c50900_40c50901 / 0xf0 [ 407.637571] EDAC FSL_DDR MC0: Err addr: 0x3e0bfff50 [ 407.642440] EDAC FSL_DDR MC0: PFN: 0x003e0bff This is the level of detail I need: SYSTEM LOGGING -------------- If logging for UEs and CEs is enabled, then system logs will contain information indicating that errors have been detected: EDAC MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0, channel 1 "DIMM_B1": amd76x_edac EDAC MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0, channel 1 "DIMM_B1": amd76x_edac The structure of the message is: the memory controller (MC0) Error type (CE) memory page (0x283) offset in the page (0xce0) the byte granularity (grain 8) or resolution of the error the error syndrome (0xb741) memory row (row 0) memory channel (channel 1) DIMM label, if set prior (DIMM B1 and then an optional, driver-specific message that may have additional information. Both UEs and CEs with no info will lack all but memory controller, error type, a notice of "no info" and then an optional, driver-specific error message. On Mon, Nov 19, 2018 at 10:48 AM York Sun <york.sun@xxxxxxx> wrote: > > On 11/19/18 8:38 AM, Tracy Smith wrote: > > Steve, you were correct, there wasn't a device tree entry for the > > qoriq memory controller in > > arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi. I added it making it > > identical to the fsl-ls1046s.dtsi, which should have the same memory > > controller and entry as the ls1043a. I added this but it didn't make > > a difference as far as being able to call the probe function. I'm now > > checking the mpc85xx_edac.c dtsi entry for comparison since York used > > the mpc85xx as the basis for the layerscape, but there is something > > else missing preventing the probe function from being called. > > > > @York > > What is your entry for > > /proc/device-tree/soc/ifc@1530000/board-control@1,0/compatible > > EDAC driver doesn't check IFC. Are you debugging EDAC for memory controller? > > > > > @York > > cat /proc/device-tree/compatible entry is this, is this correct? > > fsl,ls1043a-rdbfsl,ls1043a > > Once again, you are using your modified code on your own board. So it is > not ls1043ardb. This compatible has nothing to do with EDAC driver. > > I cannot help you with ls1043ardb because the real ls1043ardb board > doesn't support ECC. The closest board I have is ls1046ardb. > > > > > ddr: memory-controller@1080000 { > > compatible = "fsl,qoriq-memory-controller"; > > reg = <0x0 0x1080000 0x0 0x1000>; > > interrupts = <0 144 0x4>; > > big-endian; > > }; > > This is your source code, not your final device tree. Please learn to > use "fdt" command under U-Boot to dump your device tree before booting > Linux, or check after Linux is up. For your reference, on my ls1046ardb, > I have > > # cat /proc/device-tree/soc/memory-controller@1080000/compatible > fsl,qoriq-memory-controller > > York -- Confidentiality notice: This e-mail message, including any attachments, may contain legally privileged and/or confidential information. If you are not the intended recipient(s), please immediately notify the sender and delete this e-mail message.