[PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers

"Naik, Avadhut" <avadnaik@xxxxxxx> · Wed, 26 Jun 2024 12:24:20 -0500

On 6/26/2024 06:10, Borislav Petkov wrote:
> On Tue, Jun 25, 2024 at 02:56:22PM -0500, Avadhut Naik wrote:
>> AMD's Scalable MCA systems viz. Genoa will include two new registers:
> 
> "viz."?
> 
Right. Will mention Zen4 instead of Genoa.

> Not a lot of people outside of AMD know what Genoa is. Zen4 is probably a lot
> more widespread.
> 
>> MCA_SYND1 and MCA_SYND2.
>>
>> These registers will include supplemental error information in addition
>> to the existing MCA_SYND register. The data within the registers is
>> considered valid if MCA_STATUS[SyndV] is set.
> 
> From here...
> 
>> Add fields for these registers as vendor-specific error information
>> in struct mce_hw_err. Save and print these registers wherever
>> MCA_STATUS[SyndV]/MCA_SYND is currently used.
>>
>> Also, modify the mce_record tracepoint to export these new registers
>> through __dynamic_array. While the sizeof() operator has been used to
>> determine the size of this __dynamic_array, the same, if needed in the
>> future can be substituted by caching the size of vendor-specific error
>> information as part of struct mce_hw_err.
> 
> ... to here this text explains what the patch does. I guess it is time for my
> boilerplate text again:
> 
> Do not talk about *what* the patch is doing in the commit message - that
> should be obvious from the diff itself. Rather, concentrate on the *why*
> it needs to be done.
> 
> Imagine one fine day you're doing git archeology, you find the place in
> the code about which you want to find out why it was changed the way it 
> is now.
> 
> You do git annotate <filename> ... find the line, see the commit id and
> you do:
> 
> git show <commit id>
> 
> You read the commit message and there's just gibberish and nothing's
> explaining *why* that change was done. And you start scratching your
> head, trying to figure out why. Because the damn commit message is worth
> sh*t.
> 
> This happens to us maintainers at least once a week. Well, I don't want
> that to happen in my tree anymore.
> 
> You catch my drift? :)
> 
> So, now, how are those new syndromes going to be used in the tracepoint and
> why do we want them there?
> 
Yes, I catch your drift. Will reword the commit message to explain that the
new syndrome registers are going to be exported through the tracepoint
in a dynamic array, as they are vendor-specific, so that usersapce error
decoding tools can retrieve the supplemental error information within them.

>> Note: Checkpatch warnings/errors are ignored to maintain coding style.
> 
> This goes...
> 
>>
>> [Yazen: Drop Yazen's Co-developed-by tag and moved SoB tag.]
> 
> Yes, you did but now your SOB chain is wrong:
> 
>> Signed-off-by: Avadhut Naik <avadhut.naik@xxxxxxx>
>> Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> 
> This tells me Avadhut is the author, Yazen handled it and he's sending it to
> me. But nope, he isn't. So it needs another Avadhut SOB underneath.
> 
> Audit all patches pls.
> 
Wasn't aware of this chronology. Thanks for this information!
So, IIUC, the sequence for this patch should be as follows?

Signed-off-by: Avadhut Naik <avadhut.naik@xxxxxxx>
Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
Signed-off-by: Avadhut Naik <avadhut.naik@xxxxxxx>

>> ---
> 
> ... right under those three "---" as such notes do not belong in the commit
> message. Remember that for the future.
> 
Okay. Will move the note here.

>>  arch/x86/include/asm/mce.h     | 12 ++++++++++++
>>  arch/x86/kernel/cpu/mce/amd.c  |  5 ++++-
>>  arch/x86/kernel/cpu/mce/core.c | 24 +++++++++++++++++-------
>>  drivers/edac/mce_amd.c         | 10 +++++++---
>>  include/trace/events/mce.h     |  9 +++++++--
>>  5 files changed, 47 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
>> index e955edb22897..2b43ba37bbda 100644
>> --- a/arch/x86/include/asm/mce.h
>> +++ b/arch/x86/include/asm/mce.h
>> @@ -122,6 +122,9 @@
>>  #define MSR_AMD64_SMCA_MC0_DESTAT	0xc0002008
>>  #define MSR_AMD64_SMCA_MC0_DEADDR	0xc0002009
>>  #define MSR_AMD64_SMCA_MC0_MISC1	0xc000200a
>> +/* Registers MISC2 to MISC4 are at offsets B to D. */
>> +#define MSR_AMD64_SMCA_MC0_SYND1	0xc000200e
>> +#define MSR_AMD64_SMCA_MC0_SYND2	0xc000200f
>>  #define MSR_AMD64_SMCA_MCx_CTL(x)	(MSR_AMD64_SMCA_MC0_CTL + 0x10*(x))
>>  #define MSR_AMD64_SMCA_MCx_STATUS(x)	(MSR_AMD64_SMCA_MC0_STATUS + 0x10*(x))
>>  #define MSR_AMD64_SMCA_MCx_ADDR(x)	(MSR_AMD64_SMCA_MC0_ADDR + 0x10*(x))
>> @@ -132,6 +135,8 @@
>>  #define MSR_AMD64_SMCA_MCx_DESTAT(x)	(MSR_AMD64_SMCA_MC0_DESTAT + 0x10*(x))
>>  #define MSR_AMD64_SMCA_MCx_DEADDR(x)	(MSR_AMD64_SMCA_MC0_DEADDR + 0x10*(x))
>>  #define MSR_AMD64_SMCA_MCx_MISCy(x, y)	((MSR_AMD64_SMCA_MC0_MISC1 + y) + (0x10*(x)))
>> +#define MSR_AMD64_SMCA_MCx_SYND1(x)	(MSR_AMD64_SMCA_MC0_SYND1 + 0x10*(x))
>> +#define MSR_AMD64_SMCA_MCx_SYND2(x)	(MSR_AMD64_SMCA_MC0_SYND2 + 0x10*(x))
>>  
>>  #define XEC(x, mask)			(((x) >> 16) & mask)
>>  
>> @@ -189,6 +194,13 @@ enum mce_notifier_prios {
>>  
>>  struct mce_hw_err {
>>  	struct mce m;
>> +
>> +	union vendor_info {
>> +		struct {
>> +			u64 synd1;
>> +			u64 synd2;
>> +		} amd;
> 
> I presume the intent here is for Intel or other vendors to add their
> vendor-specific stuff here too?
> 
> I'm also expecting that shared fields will be promoted up to the common struct
> namespace. Pls add a short comment explaining what the goal with that struct
> is.
> 
Yes, other vendors can export their vendor-specific data through thier own
structure within the union. Yes, shared fields can be promoted to the common
structure. Will add a comment to explain the endgoal.

>> +	} vi;
> 
> Call that "vendor" so that in the code you can have
> 
> 	err.vendor.amd.
> 
> or
> 
> 	err.vendor.intel.
> 
> and so on so that it is perfectly clear what this is.
> 
Will do.
>>  };
>>  
>>  struct notifier_block;
>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
>> index cb7dc0b1aa50..fc69d244ca7f 100644
>> --- a/arch/x86/kernel/cpu/mce/amd.c
>> +++ b/arch/x86/kernel/cpu/mce/amd.c
>> @@ -799,8 +799,11 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
>>  	if (mce_flags.smca) {
>>  		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
>>  
>> -		if (m->status & MCI_STATUS_SYNDV)
>> +		if (m->status & MCI_STATUS_SYNDV) {
>>  			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
>> +			rdmsrl(MSR_AMD64_SMCA_MCx_SYND1(bank), err.vi.amd.synd1);
>> +			rdmsrl(MSR_AMD64_SMCA_MCx_SYND2(bank), err.vi.amd.synd2);
>> +		}
>>  	}
>>  
>>  	mce_log(&err);
>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
>> index 6225143b9b14..3bb0f8b39f97 100644
>> --- a/arch/x86/kernel/cpu/mce/core.c
>> +++ b/arch/x86/kernel/cpu/mce/core.c
>> @@ -189,6 +189,10 @@ static void __print_mce(struct mce_hw_err *err)
>>  	if (mce_flags.smca) {
>>  		if (m->synd)
>>  			pr_cont("SYND %llx ", m->synd);
>> +		if (err->vi.amd.synd1)
>> +			pr_cont("SYND1 %llx ", err->vi.amd.synd1);
>> +		if (err->vi.amd.synd2)
>> +			pr_cont("SYND2 %llx ", err->vi.amd.synd2);
>>  		if (m->ipid)
>>  			pr_cont("IPID %llx ", m->ipid);
>>  	}
>> @@ -639,8 +643,10 @@ static struct notifier_block mce_default_nb = {
>>  /*
>>   * Read ADDR and MISC registers.
>>   */
>> -static noinstr void mce_read_aux(struct mce *m, int i)
>> +static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
> 
> This whole conversion to struct mce_hw_err here belongs logically into patch
> 1.
> 
Had considered this. But struct mce_hw_err *err wouldn't really be used in
mce_read_aux() in patch 1. Only struct mce m, which is already available, will
be used.
Hence, deferred the change to this patch where usage of struct mce_hw_err *err
is actually introduced in mce_read_aux().

Do you prefer having this change in patch 1 instead?

>>  {
>> +	struct mce *m = &err->m;
>> +
>>  	if (m->status & MCI_STATUS_MISCV)
>>  		m->misc = mce_rdmsrl(mca_msr_reg(i, MCA_MISC));
>>  
>> @@ -662,8 +668,11 @@ static noinstr void mce_read_aux(struct mce *m, int i)
>>  	if (mce_flags.smca) {
>>  		m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
>>  
>> -		if (m->status & MCI_STATUS_SYNDV)
>> +		if (m->status & MCI_STATUS_SYNDV) {
>>  			m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
>> +			err->vi.amd.synd1 = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND1(i));
>> +			err->vi.amd.synd2 = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND2(i));
>> +		}
>>  	}
>>  }
>>  
>> @@ -766,7 +775,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
>>  		if (flags & MCP_DONTLOG)
>>  			goto clear_it;
>>  
>> -		mce_read_aux(m, i);
>> +		mce_read_aux(&err, i);
>>  		m->severity = mce_severity(m, NULL, NULL, false);
>>  		/*
>>  		 * Don't get the IP here because it's unlikely to
>> @@ -903,9 +912,10 @@ static __always_inline void quirk_zen_ifu(int bank, struct mce *m, struct pt_reg
>>   * Do a quick check if any of the events requires a panic.
>>   * This decides if we keep the events around or clear them.
>>   */
>> -static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
>> +static __always_inline int mce_no_way_out(struct mce_hw_err *err, char **msg, unsigned long *validp,
>>  					  struct pt_regs *regs)
>>  {
>> +	struct mce *m = &err->m;
>>  	char *tmp = *msg;
>>  	int i;
>>  
>> @@ -923,7 +933,7 @@ static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned lo
>>  
>>  		m->bank = i;
>>  		if (mce_severity(m, regs, &tmp, true) >= MCE_PANIC_SEVERITY) {
>> -			mce_read_aux(m, i);
>> +			mce_read_aux(err, i);
>>  			*msg = tmp;
>>  			return 1;
>>  		}
>> @@ -1321,7 +1331,7 @@ __mc_scan_banks(struct mce_hw_err *err, struct pt_regs *regs, struct mce *final,
>>  		if (severity == MCE_NO_SEVERITY)
>>  			continue;
>>  
>> -		mce_read_aux(m, i);
>> +		mce_read_aux(err, i);
>>  
>>  		/* assuming valid severity level != 0 */
>>  		m->severity = severity;
>> @@ -1522,7 +1532,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
>>  	final = this_cpu_ptr(&hw_errs_seen);
>>  	final->m = *m;
>>  
>> -	no_way_out = mce_no_way_out(m, &msg, valid_banks, regs);
>> +	no_way_out = mce_no_way_out(&err, &msg, valid_banks, regs);
>>  
>>  	barrier();
>>  
>> diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
>> index c5fae99de781..69e12cb2f0de 100644
>> --- a/drivers/edac/mce_amd.c
>> +++ b/drivers/edac/mce_amd.c
>> @@ -792,7 +792,8 @@ static const char *decode_error_status(struct mce *m)
>>  static int
>>  amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>>  {
>> -	struct mce *m = &((struct mce_hw_err *)data)->m;
>> +	struct mce_hw_err *err = (struct mce_hw_err *)data;
>> +	struct mce *m = &err->m;
>>  	unsigned int fam = x86_family(m->cpuid);
>>  	int ecc;
>>  
>> @@ -850,8 +851,11 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>>  	if (boot_cpu_has(X86_FEATURE_SMCA)) {
>>  		pr_emerg(HW_ERR "IPID: 0x%016llx", m->ipid);
>>  
>> -		if (m->status & MCI_STATUS_SYNDV)
>> -			pr_cont(", Syndrome: 0x%016llx", m->synd);
>> +		if (m->status & MCI_STATUS_SYNDV) {
>> +			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
>> +			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
>> +				 err->vi.amd.synd1, err->vi.amd.synd2);
>> +		}
>>  
>>  		pr_cont("\n");
>>  
>> diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
>> index 65aba1afcd07..9e7211eddbca 100644
>> --- a/include/trace/events/mce.h
>> +++ b/include/trace/events/mce.h
>> @@ -43,6 +43,8 @@ TRACE_EVENT(mce_record,
>>  		__field(	u8,		bank		)
>>  		__field(	u8,		cpuvendor	)
>>  		__field(	u32,		microcode	)
>> +		__field(	u8,		len	)
>> +		__dynamic_array(u8, v_data, sizeof(err->vi))
>>  	),
>>  
>>  	TP_fast_assign(
>> @@ -65,9 +67,11 @@ TRACE_EVENT(mce_record,
>>  		__entry->bank		= err->m.bank;
>>  		__entry->cpuvendor	= err->m.cpuvendor;
>>  		__entry->microcode	= err->m.microcode;
>> +		__entry->len		= sizeof(err->vi);
>> +		memcpy(__get_dynamic_array(v_data), &err->vi, sizeof(err->vi));
> 
> So that vendor data layout - is that ABI too? Or are we free to shuffle the
> fields around in the future or even remove some?
> 
> This all needs to be specified somewhere explicitly so that nothing relies on
> that layout.
> 
> And I'm not sure that that's enough because when userspace tools start using
> them, then they're practically an ABI so you can't change them even if you
> wanted to.
> 
> So is libtraceevent or all the other libraries going to parse this as a blob
> and it is always going to remain such?
> 
> But then the tools which interpret it need to know its layout and if it
> changes, perhaps check kernel version which then becomes RealUgly(tm).
> 
> So you might just as well dump the separate fields one by one, without
> a dynamic array.
> 
> Or do a dynamic array but specify that their layout in struct
> mce_hw_er.vendor.amd are cast in stone so that we're all clear on what goes
> where.
> 
> Questions over questions...
> 
Should we document this where struct mce_hw_err is defined, in
arch/x86/include/asm/mce.h? Or do you have any other recommendations?

IIUC, the libtraceevent library relies on tracepoint's format in tracefs. Below
is the format with this patchset incorporated.

[root avadnaik]# cat /sys/kernel/debug/tracing/events/mce/mce_record/format 
name: mce_record
ID: 113
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:u64 mcgcap;       offset:8;       size:8; signed:0;
        field:u64 mcgstatus;    offset:16;      size:8; signed:0;
        field:u64 status;       offset:24;      size:8; signed:0;
        field:u64 addr; offset:32;      size:8; signed:0;
        field:u64 misc; offset:40;      size:8; signed:0;
        field:u64 synd; offset:48;      size:8; signed:0;
        field:u64 ipid; offset:56;      size:8; signed:0;
        field:u64 ip;   offset:64;      size:8; signed:0;
        field:u64 tsc;  offset:72;      size:8; signed:0;
        field:u64 ppin; offset:80;      size:8; signed:0;
        field:u64 walltime;     offset:88;      size:8; signed:0;
        field:u32 cpu;  offset:96;      size:4; signed:0;
        field:u32 cpuid;        offset:100;     size:4; signed:0;
        field:u32 apicid;       offset:104;     size:4; signed:0;
        field:u32 socketid;     offset:108;     size:4; signed:0;
        field:u8 cs;    offset:112;     size:1; signed:0;
        field:u8 bank;  offset:113;     size:1; signed:0;
        field:u8 cpuvendor;     offset:114;     size:1; signed:0;
        field:u32 microcode;    offset:116;     size:4; signed:0;
        field:u8 len;   offset:120;     size:1; signed:0;
        field:__data_loc u8[] v_data;   offset:124;     size:4; signed:0;

print fmt: "CPU: %d, MCGc/s: %llx/%llx, MC%d: %016llx, IPID: %016llx, ADDR: %016llx, MISC: %016llx, SYND: %016llx, RIP: %02x:<%016llx>, TSC: %llx, PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x, microcode: %x, vendor data: %s", REC->cpu, REC->mcgcap, REC->mcgstatus, REC->bank, REC->status, REC->ipid, REC->addr, REC->misc, REC->synd, REC->cs, REC->ip, REC->tsc, REC->ppin, REC->cpuvendor, REC->cpuid, REC->walltime, REC->socketid, REC->apicid, REC->microcode, __print_array(__get_dynamic_array(v_data), REC->len / 8, 8)

So, yes, the tools which interpret the vendor data need to aware of its layout
if things like FRUTEXT are to be decoded from the data.

Just FYI, patch adding support for this in rasdaemon, has already been merged in.
https://github.com/mchehab/rasdaemon/pull/122/commits/926c2b39c6386d0a1bf4232977f9fd7e37850361

> Thx.
> 

-- 
Thanks,
Avadhut Naik