Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Em Tue, 13 Aug 2013 23:02:08 +0530
"Naveen N. Rao" <naveen.n.rao@xxxxxxxxxxxxxxxxxx> escreveu:

> On 08/13/2013 06:12 PM, Borislav Petkov wrote:
> > On Tue, Aug 13, 2013 at 04:51:33PM +0530, Naveen N. Rao wrote:
> >> You're right - my trace point makes all the data provided by apei
> >> as-is to userspace. However, ghes_edac seems to squash some of this
> >> data into a string when reporting through mc_event.
> >
> > Right, for systems which don't need EDAC to decode to the DIMM or for
> > which there are no EDAC drivers written, they could use a tracepoint
> > which carries APEI info as-is. Others, which need EDAC, should probably
> > use trace_mc_event and disable the APEI tracepoint.
> 
> If I'm not mistaken, even for systems that have EDAC drivers, it looks 
> to me like EDAC can't really decode to the DIMM given what is provided 
> by the bios in the APEI report currently.

Yes, the current APEI events, reported via EDAC, can't be decoded currently.

> If and when ghes_edac gains 
> this capability, users will have a choice between raw APEI reports vs. 
> edac processed ones.

An APEI-specific tracing won't fix it, as, AFAIKT, we don't have any way
to map it, even on userspace.

> 
> >
> > I think this should address Tony's concerns...
> >
> > Btw, you could call your TP something simpler like
> > trace_ghes_memory_event or so.
> 
> I started out with a simpler name, but eventually decided to use the 
> name from the CPER record so it is clear what this event carries. I 
> think this will be better when adding further ghes events for say, 
> processor generic, PCIe and others.
> 
> >
> > Btw 2, if GHES can report other types of errors (I'm pretty sure it can)
> > maybe we can use a single tracepoint called trace_ghes_event for any
> > types of errors coming out of it...
> 
> Two problems with this:
> - One, the record size will be really big since the cper records for 
> each type of error is large.
> - Two, it may be better to filter events based on the type of error 
> (memory error, processor, pcie, ...) rather than subscribing for all 
> ghes error reports.

I agree: per-type of error events is better than a big generic one.
> 
> >
> > Oh, and while at it, we probably need to start thinking of a mechanism
> > to disable all the error printing, i.e. cper_print_mem() and such,
> > if a userspace agent is listening in on the tracepoint and the error
> > information is carried through it to userspace.
> 
> Do you mean conditionally print the cper records based on whether the 
> tracepoint is enabled or not? Wouldn't that be confusing if someone is 
> monitoring dmesg as well?
> 
> 
> Thanks,
> Naveen
> 


-- 

Cheers,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux