On Fri, Mar 07, 2014 at 12:44:16PM +0100, Borislav Petkov wrote: [...] > > +static void mem_err_location(struct cper_sec_mem_err *mem) > > +{ > > + char *p; > > + u32 n = 0; > > + > > + memset(mem_location, 0, LOC_LEN); > > + p = mem_location; > > + if (mem->validation_bits & CPER_MEM_VALID_NODE) > > + n += sprintf(p + n, " node: %d", mem->node); > > + if (n >= LOC_LEN) > > + goto end; > > + if (mem->validation_bits & CPER_MEM_VALID_CARD) > > + n += sprintf(p + n, " card: %d", mem->card); > > + if (n >= LOC_LEN) > > + goto end; > > + if (mem->validation_bits & CPER_MEM_VALID_MODULE) > > + n += sprintf(p + n, " module: %d", mem->module); > > + if (n >= LOC_LEN) > > + goto end; > > + if (mem->validation_bits & CPER_MEM_VALID_RANK_NUMBER) > > + n += sprintf(p + n, " rank: %d", mem->rank); > > + if (n >= LOC_LEN) > > + goto end; > > + if (mem->validation_bits & CPER_MEM_VALID_BANK) > > + n += sprintf(p + n, " bank: %d", mem->bank); > > + if (n >= LOC_LEN) > > + goto end; > > + if (mem->validation_bits & CPER_MEM_VALID_DEVICE) > > + n += sprintf(p + n, " device: %d", mem->device); > > + if (n >= LOC_LEN) > > + goto end; > > + if (mem->validation_bits & CPER_MEM_VALID_ROW) > > + n += sprintf(p + n, " row: %d", mem->row); > > + if (n >= LOC_LEN) > > + goto end; > > + if (mem->validation_bits & CPER_MEM_VALID_COLUMN) > > + n += sprintf(p + n, " column: %d", mem->column); > > + if (n >= LOC_LEN) > > + goto end; > > + if (mem->validation_bits & CPER_MEM_VALID_BIT_POSITION) > > + n += sprintf(p + n, " bit_position: %d", mem->bit_pos); > > + if (n >= LOC_LEN) > > + goto end; > > + if (mem->validation_bits & CPER_MEM_VALID_REQUESTOR_ID) > > + n += sprintf(p + n, " requestor_id: 0x%016llx", > > + mem->requestor_id); > > + if (n >= LOC_LEN) > > + goto end; > > + if (mem->validation_bits & CPER_MEM_VALID_RESPONDER_ID) > > + n += sprintf(p + n, " responder_id: 0x%016llx", > > + mem->responder_id); > > + if (n >= LOC_LEN) > > + goto end; > > + if (mem->validation_bits & CPER_MEM_VALID_TARGET_ID) > > + n += sprintf(p + n, " target_id: 0x%016llx", mem->target_id); > > +end: > > + return; > > +} > > Looks like this wants to share with cper_print_mem() - definitely a lot > of duplication there. > > > + > > +static void dimm_err_location(struct cper_sec_mem_err *mem) > > +{ > > + const char *bank = NULL, *device = NULL; > > + > > + memset(dimm_location, 0, LOC_LEN); > > + if (!(mem->validation_bits & CPER_MEM_VALID_MODULE_HANDLE)) > > + return; > > + > > + dmi_memdev_name(mem->mem_dev_handle, &bank, &device); > > + if (bank != NULL && device != NULL) > > + snprintf(dimm_location, LOC_LEN - 1, "%s %s", bank, device); > > + else > > + snprintf(dimm_location, LOC_LEN - 1, "DMI handle: 0x%.4x", > > + mem->mem_dev_handle); > > +} > > This one too. > Not really. Firstly they service for different purpose. Secondly the format here can be changed/updated depending on further requirment. I can't assume they always keep the same format. > > + > > +static void trace_mem_error(const uuid_le *fru_id, char *fru_text, > > + u64 err_count, u32 severity, > > + struct cper_sec_mem_err *mem) > > +{ > > + u32 etype = ~0U; > > + u64 phy_addr = ~0ull; > > I'm assuming userspace knows that all 1s means field value is invalid? Yep, I suppose so. > > > + unsigned long flags; > > + > > + if (mem->validation_bits & CPER_MEM_VALID_ERROR_TYPE) > > + etype = mem->error_type; > > newline. Sure. [...] > We probably need a mechanism to disable printking to dmesg once > userspace has opened the tracepoint. Do we really need to do that? IMHO, I think they are used for two different usages, just like dmesg & mcelog. [...] > > static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem) > > { > > if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS) > > @@ -233,8 +241,7 @@ static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem) > > if (mem->validation_bits & CPER_MEM_VALID_ERROR_TYPE) { > > u8 etype = mem->error_type; > > printk("%s""error_type: %d, %s\n", pfx, etype, > > - etype < ARRAY_SIZE(cper_mem_err_type_strs) ? > > - cper_mem_err_type_strs[etype] : "unknown"); > > + cper_mem_err_type_str(etype)); > > } > > if (mem->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) { > > const char *bank = NULL, *device = NULL; > > Ditto. I know you hope the print function in CPER & trace for cpi_extlog can be merged into one. I just have one concern about it. Can we ensure these two functions keeping align all the time? IOW, merge them for now until change happens one day? [...] > > +#define LOC_LEN 512 > > + > > +TRACE_EVENT(extlog_mem_event, > > So this is a mem thing so we're defining a tracepoint for memory events, > specifically. > > However, if extlog carries all kinds of errors outside, not only DRAM > errors, we should do a TRACE_EVENT_CLASS which contains the shared args > to every error type and then make a mem event ontop of it. I agree.
Attachment:
signature.asc
Description: Digital signature