On Tue, 2017-07-18 at 08:00 +0200, Borislav Petkov wrote: > On Mon, Jul 17, 2017 at 03:59:12PM -0600, Toshi Kani wrote: > > The ghes_edac driver was introduced in 2013 [1], but it has not > > been enabled by any distro yet. This driver obtains error info > > from firmware interfaces, which are not properly implemented on > > many platforms, as the driver always emits the messages below: > > > > This EDAC driver relies on BIOS to enumerate memory and get error > > reports. Unfortunately, not all BIOSes reflect the memory layout > > correctly So, the end result of using this driver varies from > > vendor to vendor If you find incorrect reports, please contact > > your hardware vendor to correct its BIOS. > > > > To get out from this situation, add a platform type check to > > selectively enable the driver on the platforms that are known to > > have proper firmware implementation. Platform vendors can add > > their platforms to the list when they support ghes_edac. > > So maintaining whitelists for things has always been a PITA and we > should try to avoid it, if possible. (We can always do it if nothing > saner comes along.) Agreed. > Now, below is a dirty patch converting ghes_edac to a normal module. > On systems where we have GHES, the firmware generally disables the > detection of the presence of ECC hardware, thus preventing the > platform EDAC driver from loading. I have HPE Haswell and Skylake test systems with GHES, but they do not hide IMCs from the OS. So, the sb_edac and skx_edac drivers get attached on these systems when ghes_edac is disabled. > Let me clarify: I have an AMD HP box which, when GHES is enabled in > the BIOS, says that ECC is disabled in the memory controller and the > amd64_edac driver doesn't load for that memory controller. Hmm... what's the platform name of this box? I can look into this case if you need. > And I think we should try this first: have the firmware disable > detection methods so that the platform drivers don't load. I do not think we can rely on this method. > Then, ghes_edac can be a simple module and no other driver would > attempt loading. I like the use of notifier chain, which is much cleaner. > The question is: does the platform do this disabling now? Unfortunately, that is not the case today. The IMCs cannot be hidden with the Device Hide registers for Skylake at least. Thanks, -Toshi ��.n��������+%������w��{.n�����{�����ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f