On Tue, Sep 01, 2020 at 03:01:40PM +0100, Shiju Jose wrote: > When the CPU correctable errors reported on an ARM64 CPU core too often, > it should be isolated. Add the CPU correctable error collector to > store the CPU correctable error count. > > When the correctable error count for a CPU exceed the threshold > value in a short time period, it will try to isolate the CPU core. > The threshold value, time period etc are configurable. > > Implementation details is added in the file. > > Signed-off-by: Shiju Jose <shiju.jose@xxxxxxxxxx> > --- > Documentation/ABI/testing/debugfs-cpu-cec | 22 ++ > arch/arm64/ras/Kconfig | 8 + > drivers/acpi/apei/ghes.c | 30 +- > drivers/ras/Kconfig | 1 + > drivers/ras/Makefile | 1 + > drivers/ras/cpu_cec.c | 393 ++++++++++++++++++++++ So instead of adding the ability to collect other error types to the CEC, you're duplicating the CEC itself?! Why? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette