On Fri, Oct 30, 2015 at 05:06:06PM +0000, Mark Rutland wrote: > > * Correctable errors does not generate any interrupt: > > If we have to implement error parsing inside the firmware then work need > > to be split between OS and firmware. Maybe OS can call SMC instruction to > > dial into firmware and then firmware can check error syndrome registers; > > if it finds correctable error then build HEST table. This method will introduce > > performance issue because it require OS executing SMC every 100ms or so to just > > poll for correctable error. If you have any other recommendation then please share it. > > I agree that this is a problem, and is an unfortunate hardware > limitation. > > I am still wary of making use of IMPLEMENTATION DEFINED features like > this in the kernel. Well, you could do all the correctable errors collecting in the firmware and only report those errors to the OS when they're overflowing/reach a certain threshold. The idea behind it being that you don't really want to upset the user about *every* correctable error happening because it was correctable and the hardware, well, doh, corrected it. No problem. But when those errors start repeating and hitting the same DIMM and addresses in close proximity, there might be a problem which you should report. Btw, we have been looking for doing something like that on x86: https://lkml.kernel.org/r/1404242623-10094-1-git-send-email-bp@xxxxxxxxx and one of those days I'll upstream the damn thing! :-) -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html