On Tue, Jun 25, 2024 at 02:56:21PM -0500, Avadhut Naik wrote: > Currently, exporting new additional machine check error information > involves adding new fields for the same at the end of the struct mce. > This additional information can then be consumed through mcelog or > tracepoint. > > However, as new MSRs are being added (and will be added in the future) > by CPU vendors on their newer CPUs with additional machine check error > information to be exported, the size of struct mce will balloon on some > CPUs, unnecessarily, since those fields are vendor-specific. Moreover, > different CPU vendors may export the additional information in varying > sizes. > > The problem particularly intensifies since struct mce is exposed to > userspace as part of UAPI. It's bloating through vendor-specific data > should be avoided to limit the information being sent out to userspace. > > Add a new structure mce_hw_err to wrap the existing struct mce. The same > will prevent its ballooning since vendor-specifc data, if any, can now be > exported through a union within the wrapper structure and through > __dynamic_array in mce_record tracepoint. > > Furthermore, new internal kernel fields can be added to the wrapper > struct without impacting the user space API. > > Note: Some Checkpatch checks have been ignored to maintain coding style. > > [Yazen: Add last commit message paragraph.] > > Suggested-by: Borislav Petkov (AMD) <bp@xxxxxxxxx> > Signed-off-by: Avadhut Naik <avadhut.naik@xxxxxxx> > Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx> > --- > arch/x86/include/asm/mce.h | 6 +- > arch/x86/kernel/cpu/mce/amd.c | 29 ++-- > arch/x86/kernel/cpu/mce/apei.c | 54 +++---- > arch/x86/kernel/cpu/mce/core.c | 178 +++++++++++++----------- > arch/x86/kernel/cpu/mce/dev-mcelog.c | 2 +- > arch/x86/kernel/cpu/mce/genpool.c | 20 +-- > arch/x86/kernel/cpu/mce/inject.c | 4 +- > arch/x86/kernel/cpu/mce/internal.h | 4 +- > drivers/acpi/acpi_extlog.c | 2 +- > drivers/acpi/nfit/mce.c | 2 +- > drivers/edac/i7core_edac.c | 2 +- > drivers/edac/igen6_edac.c | 2 +- > drivers/edac/mce_amd.c | 2 +- > drivers/edac/pnd2_edac.c | 2 +- > drivers/edac/sb_edac.c | 2 +- > drivers/edac/skx_common.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +- > drivers/ras/amd/fmpm.c | 2 +- > drivers/ras/cec.c | 2 +- > include/trace/events/mce.h | 42 +++--- > 20 files changed, 199 insertions(+), 162 deletions(-) Ok, did some minor massaging but otherwise looks ok now. Tony, any comments? You ok with this, would that fit any Intel-specific vendor fields too or do you need some additional Intel-specific changes? Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette