RE: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from loading after ghes_edac is registered

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Kani, Toshi <toshi.kani@xxxxxxx>
> Sent: Friday, August 19, 2022 12:49 PM
> To: Justin He <Justin.He@xxxxxxx>
> Cc: linux-acpi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-
> edac@xxxxxxxxxxxxxxx; devel@xxxxxxxxxx; Rafael J . Wysocki
> <rafael@xxxxxxxxxx>; Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx>; Jarkko
> Sakkinen <jarkko@xxxxxxxxxx>; linux-efi@xxxxxxxxxxxxxxx; nd
> <nd@xxxxxxx>; Ard Biesheuvel <ardb@xxxxxxxxxx>; Len Brown
> <lenb@xxxxxxxxxx>; James Morse <James.Morse@xxxxxxx>; Tony Luck
> <tony.luck@xxxxxxxxx>; Borislav Petkov <bp@xxxxxxxxx>; Mauro Carvalho
> Chehab <mchehab@xxxxxxxxxx>; Robert Richter <rric@xxxxxxxxxx>; Robert
> Moore <robert.moore@xxxxxxxxx>; Qiuxu Zhuo <qiuxu.zhuo@xxxxxxxxx>;
> Yazen Ghannam <yazen.ghannam@xxxxxxx>
> Subject: RE: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac
> from loading after ghes_edac is registered
> 
> On Friday, August 19, 2022 4:35 AM, Justin He wrote:
> > > > @@ -1382,6 +1395,7 @@ static int ghes_probe(struct
> platform_device
> > > > *ghes_dev)
> > > >  	platform_set_drvdata(ghes_dev, ghes);
> > > >
> > > >  	ghes->dev = &ghes_dev->dev;
> > > > +	set_ghes_devs_registered(false);
> > >
> > > This does not look right to me.
> > >
> > > The condition of using ghes_edac is (ghes-present) && (ghes-
> preferred),
> > > where:
> > >  - ghes-present is latched on ghes_probe()
> > >  - ghes-preferred is true if the platform-check passes.
> > >
> > > ghes_get_device() introduced in the previous patch works as the
> > > ghes-preferred check.
> > >
> > > We cannot use ghes_edac registration as the ghes-present check in
> this
> > > scheme since it is deferred to module_init().
> >
> > What is the logic for ghes-present check? In this patch, I assumed it
> is equal to
> > "ghes_edac devices have been registered". Seems it is not correct.
> 
> Using (ghes_edac-registered) is a wrong check in this scheme
> since ghes_edac registration is deferred.  This check is fine in
> the current scheme since ghes_edac gets registered before
> any other chipset-specific edac drivers.
> 
> > But should we consider the case as follows:
> > What if sbridge_init () is invoked before ghes_edac_init()? i.e.
> Should we get
> > sb_edac driver selected when ghes_edac is not loaded yet (e.g. on
> HPE)?
> 
> No.  The point is that ghes_edac driver needs to be selected,
> "regardless of the module ordering", on a system with GHES
> present & preferred.
>
> Note that this new scheme leads to the following condition
> change:
> - Current: (ghes-present) && (ghes-preferred) && (ghes_edac registered)
> - New: (ghes-present) && (ghes-preferred)
> 
> The option I suggested previously keeps the current condition,
> but this new scheme does not for better modularity.
> 
> What this means is that if ghes_edac is not enabled (but ghes
> is enabled) on a system with GHES present & preferred, no edac
> driver gets registered.  This change is fine from my (HPE) perspective
> and should be fine for other GHES systems.  GHES systems have
> chipset-specific edac driver in FW.  OS-based chipset-specific edac
> driver is not necessary and may lead to a conflict of chipset register
> ownership.

Currently, running with this on the kernel command line
	ghes.disable

causes the ACPI ghes driver to quit early in acpi_ghes_init():

  /*
   * This driver isn't really modular, however for the time being,
   * continuing to use module_param is the easiest way to remain
   * compatible with existing boot arg use cases.
   */
  bool ghes_disable;
  module_param_named(disable, ghes_disable, bool, 0);

which results in the skx_edac module assuming it should run:
  [   33.628140] calling  skx_init+0x0/0xe5a [skx_edac] @ 1444
  [   33.628531] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:36:0a.0 (INTERRUPT)
  [   33.641432] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:36:0c.0 (INTERRUPT)
  [   33.653256] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT)
  [   33.665055] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT)
  [   33.676801] initcall skx_init+0x0/0xe5a [skx_edac] returned 0 after 36343 usecs

We might need to differentiate between the system ROM really not
offering GHES vs. the ghes module not running.





[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux