RE: [PATCH v2] EDAC/i10nm: shift exponent is negative

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Luck, Tony <tony.luck@xxxxxxxxx>
> Sent: Wednesday, July 5, 2023 11:22 PM
> ...
> Subject: RE: [PATCH v2] EDAC/i10nm: shift exponent is negative
> 
> >> # head /proc/cpuinfo
> 
> This shows your system is the workstation version of Sapphire rapids. I don't
> think we did any validation of the EDAC driver against this model.

No, we didn't do any validation of the EDAC on Sapphires Rapids workstations.
>From the link below, we know this is a Sapphire Rapids workstation with only 2 memory controllers.
https://www.intel.com/content/www/us/en/products/sku/233480/intel-xeon-w32435-processor-22-5m-cache-3-10-ghz/specifications.html

We only did validation on the Sapphire Rapids servers which were with 4 memory controllers per socket before. 

> > # dmidecode -t 17
> 
> You have just one 16GB DIMM, and EDAC found that. So despite the messy
> warnings, EDAC should be working for you.
> 
> > # lspci
> 
> I didn't dig into this. Qiuxu - can you compare this against a server Sapphire
> rapids?
> Maybe it has some clues so the EDAC driver will know not to look for non-
> existent memory controllers.

This Sapphire Rapids workstation had 2 memory controllers but appeared 
4 memory controller PCIe devices from the log:

    0000:fe:0c.0 1101: 8086:324a
    0000:fe:0d.0 1101: 8086:324a // absent mc fooling the driver, should not appear
    0000:fe:0e.0 1101: 8086:324a
    0000:fe:0f.0 1101: 8086:324a // absent mc fooling the driver, should not appear

By observing that the MMIO registers of these absent
memory controllers consistently hold the value of ~0.
We may identify a memory controller as absent by checking
if its MMIO register "mcmtr" == ~0 in all its channels.

I made a patch below to skip all these absent memory controllers
https://lore.kernel.org/linux-edac/20230706134216.37044-1-qiuxu.zhuo@xxxxxxxxx/T/#u
@Koba Ko, could you please verify the patch from the link above on your workstation? Thanks! 

BTW,
Kai-Heng Feng also found the same issue before:
https://lore.kernel.org/linux-edac/CAAd53p41Ku1m1rapeqb1xtD+kKuk+BaUW=dumuoF0ZO3GhFjFA@xxxxxxxxxxxxxx/T/#m5de16dce60a8c836ec235868c7c16e3fefad0cc2

- Qiuxu




[Index of Archives]     [Kernel Development]     [Kernel Announce]     [Kernel Newbies]     [Linux Networking Development]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Device Mapper]

  Powered by Linux