[stable request 3.18] sb_edac: Fix discovery of top-of-low-memory for Haswell

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Please consider upstream 3.19 commit
f7cf2a22a2896d3b3595b71d7936b6d7a3316b00 "sb_edac: Fix discovery of
top-of-low-memory for Haswell" for stable tree 3.18.

This patch address an issue introduced with 3.17 commit
50d1bb93672fa2f42cec6e06ce799fbe864f57e9 "sb_edac: add support for
Haswell based systems".

commit f7cf2a22a2896d3b3595b71d7936b6d7a3316b00
Author: Tony Luck <tony.luck@xxxxxxxxx>
Date:   Wed Oct 29 10:36:50 2014 -0700

    sb_edac: Fix discovery of top-of-low-memory for Haswell

    Haswell moved the TOLM/TOHM registers to a different device and offset.
    The sb_edac driver accounted for the change of device, but not for the
    new offset.  There was also a typo in the constant to fill in the low
    26 bits (was 0x1ffffff, should be 0x3ffffff).

    This resulted in a bogus value for the top of low memory:

      EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff)

    which would result in EDAC refusing to translate addresses for
    errors above the bogus value and below 4GB:

       sbridge MC3: HANDLING MCE MEMORY ERROR
       sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
       sbridge MC3: TSC 0
       sbridge MC3: ADDR 2000000
       sbridge MC3: MISC 523eac86
       sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0
       MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory
( page:0x0 offset:0x0 grain:32 syndrome:0x0)

    With the fix we see the correct TOLM value:

       DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff)

    and we decode address 2000000 correctly:

       sbridge MC3: HANDLING MCE MEMORY ERROR
       sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
       sbridge MC3: TSC 0
       sbridge MC3: ADDR 2000000
       sbridge MC3: MISC 523e1086
       sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0
       DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU
socket 0, HA 0, shiftup: 0
       DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000
< 0x000000007fffffff, socket interleave 1, channel interleave 4
(offset 0x00000000), index 0, base ch: 0, ch mask: 0x01
       DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB
(0x00000000ffffffff), way: 1
       DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000
< 0xffffffff, RIR interleave 0, index 0
       DEBUG: sbridge_mce_output_error:  area:DRAM err_code:0001:0090
socket:0 channel_mask:1 rank:0
       MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0
(channel:0 slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 -
area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)

    Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
    Acked-by: Aristeu Rozanski <aris@xxxxxxxxxx>
    Signed-off-by: Mauro Carvalho Chehab <mchehab@xxxxxxxxxxxxxxx>


Cheers,
Vinson
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]