I am working to detect ECC errors. In the past we have used EDAC to
detect the errors. I have started with with folks in the EDAC space,
they are looking at decoding MCE level errors filling in the EDAC sysfs
space.
Regardless of MCE or EDAC implementation I need access to the PCI space
to decode the ECC memory errors.
ECC errors on Nehalem are reported as machine check events; you don't need
special PCI devices to read those. The kernel does it by default.
The latest mcelog git version is also able to decode the DIMM numbers
based on that.
-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html