Re: Extended H/W error log driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 11, 2013 at 02:32:38AM -0400, Chen, Gong wrote:
> [56005.785917] {3}Hardware error detected on CPU0
> [56005.785959] {3}event severity: corrected
> [56005.785975] {3}sub_event[0], severity: corrected
> [56005.785977] {3}section_type: memory error
> [56005.785981] {3}physical_address: 0x0000000851fe0000
> [56005.786027] {3}DIMM location: Memriser1 CHANNEL A DIMM 0

Very good guys, I've been waiting for years for this to be possible,
good job! :-)

Btw, what's "Memriser1"?

> [56005.786154] {4}Hardware error detected on CPU0
> [56005.786159] {4}event severity: corrected
> [56005.786162] {4}sub_event[0], severity: corrected

This sub_event[0] could use better decoding though.

> [56005.786166] {4}section_type: memory error
> 
> 
> trace output:
> 
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 4/4   #P:120
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
> ...
> ...
>           <idle>-0     [000] d.h. 56068.488759: extlog_mem_event: 3 corrected errors:unknown

That "unknown" thing needs a " " in front of it and comes from
cper_mem_err_type_str, AFAICT. I'm guessing the value is 0 and
uninitialized or so?

> on Memriser1 CHANNEL A DIMM 0(FRU:

Also another " " missing here.

> 00000000-0000-0000-0000-000000000000  physical addr: 0x0000000851fe0000 node: 0 card: 0 module: 0 rank: 0 bank: 0 row: 28927 column: 1296)
>           <idle>-0     [000] d.h. 56068.488834: extlog_mem_event: 4 corrected errors:unknown
> ...
> ...
> 
> dmesg output are shrank to only keep the most important data. The trace
> output will contain most of data. Not sure if all fields are meaningful
> to users. Some fields like FRU ID/FRU TEXT depends on BIOS manufactor.
> So welcome to add comments for what is needed or not.

Yeah, I guess we again depend on BIOS people to fill those in. I'd
expect serious server manifacturers who care about RAS to do so...

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux