On Fri, Jan 15, 2021 at 11:06:30AM +0000, Shiju Jose wrote: > L2 cache corrected errors are detected occasionally on few of > our ARM64 hardware boards. Though it is rare, the probability of > the CPU cache errors frequently occurring can't be avoided. > The earlier failure detection by monitoring the cache corrected > errors for the frequent occurrences and taking preventive > action could prevent more serious hardware faults. > > On Intel architectures, cache corrected errors are reported and > the affected cores are offline in the architecture specific method. > http://www.mcelog.org/cache.html > > However for the firmware-first error reporting, specifically on > ARM64 architectures, there is no provision present for reporting > the cache corrected error count to the user-space and taking > preventive action such as offline the affected cores. How hard was it to write that in your first submission? What do you think would be the best way to persuade a patch reviewer/maintainer to take a look at your submission? > >Why a separate Kconfig item? > CONFIG_EDAC_GHES_CPU_CACHE_ERROR is added to make this > feature optional only for the platforms which need this and supported. > > > > >> + depends on EDAC_GHES depends on EDAC_GHES hardly expresses which platforms need it/support it. If anything, depends on ARM64. > >Init stuff belongs into ghes_scan_system(). > > > Did you mean calling ghes_edac_create_cpu_device() in the ghes_scan_system()? I mean, all hardware discovery needs to happen in ghes_scan_system - you don't need to call those from outside the driver, in ghes_edac_register(). -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette