Hi Boris, >-----Original Message----- >From: Borislav Petkov [mailto:bp@xxxxxxxxx] >Sent: 18 January 2021 18:37 >To: Shiju Jose <shiju.jose@xxxxxxxxxx> >Cc: linux-edac@xxxxxxxxxxxxxxx; linux-acpi@xxxxxxxxxxxxxxx; linux- >kernel@xxxxxxxxxxxxxxx; james.morse@xxxxxxx; >mchehab+huawei@xxxxxxxxxx; tony.luck@xxxxxxxxx; rjw@xxxxxxxxxxxxx; >lenb@xxxxxxxxxx; rrichter@xxxxxxxxxxx; Jonathan Cameron ><jonathan.cameron@xxxxxxxxxx>; tanxiaofei <tanxiaofei@xxxxxxxxxx>; >linuxarm@xxxxxxxxxxxxx >Subject: Re: [RFC PATCH 1/2] EDAC/ghes: Add EDAC device for the CPU >caches > >On Fri, Jan 15, 2021 at 11:06:30AM +0000, Shiju Jose wrote: >> L2 cache corrected errors are detected occasionally on few of our >> ARM64 hardware boards. Though it is rare, the probability of the CPU >> cache errors frequently occurring can't be avoided. >> The earlier failure detection by monitoring the cache corrected errors >> for the frequent occurrences and taking preventive action could >> prevent more serious hardware faults. >> >> On Intel architectures, cache corrected errors are reported and the >> affected cores are offline in the architecture specific method. >> http://www.mcelog.org/cache.html >> >> However for the firmware-first error reporting, specifically on >> ARM64 architectures, there is no provision present for reporting the >> cache corrected error count to the user-space and taking preventive >> action such as offline the affected cores. > >How hard was it to write that in your first submission? What do you think >would be the best way to persuade a patch reviewer/maintainer to take a >look at your submission? > >> >Why a separate Kconfig item? >> CONFIG_EDAC_GHES_CPU_CACHE_ERROR is added to make this feature >> optional only for the platforms which need this and supported. >> >> > >> >> + depends on EDAC_GHES > >depends on EDAC_GHES hardly expresses which platforms need it/support it. > >If anything, depends on ARM64. Sure. I will add dependency on ARM64. This EDAC code for the cache errors is architecture independent for the firmware-first error reporting and could be used for other architectures, though now we need it for the ARM64. > >> >Init stuff belongs into ghes_scan_system(). >> > >> Did you mean calling ghes_edac_create_cpu_device() in the >ghes_scan_system()? > >I mean, all hardware discovery needs to happen in ghes_scan_system >- you don't need to call those from outside the driver, in >ghes_edac_register(). sure. Will modify. > >-- >Regards/Gruss, > Boris. Thanks, Shiju