> -----Original Message----- > From: Kani, Toshi <toshi.kani@xxxxxxx> > Sent: Friday, August 19, 2022 12:49 PM > To: Justin He <Justin.He@xxxxxxx> > Cc: linux-acpi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux- > edac@xxxxxxxxxxxxxxx; devel@xxxxxxxxxx; Rafael J . Wysocki > <rafael@xxxxxxxxxx>; Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx>; Jarkko > Sakkinen <jarkko@xxxxxxxxxx>; linux-efi@xxxxxxxxxxxxxxx; nd > <nd@xxxxxxx>; Ard Biesheuvel <ardb@xxxxxxxxxx>; Len Brown > <lenb@xxxxxxxxxx>; James Morse <James.Morse@xxxxxxx>; Tony Luck > <tony.luck@xxxxxxxxx>; Borislav Petkov <bp@xxxxxxxxx>; Mauro Carvalho > Chehab <mchehab@xxxxxxxxxx>; Robert Richter <rric@xxxxxxxxxx>; Robert > Moore <robert.moore@xxxxxxxxx>; Qiuxu Zhuo <qiuxu.zhuo@xxxxxxxxx>; > Yazen Ghannam <yazen.ghannam@xxxxxxx> > Subject: RE: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac > from loading after ghes_edac is registered > > On Friday, August 19, 2022 4:35 AM, Justin He wrote: > > > > @@ -1382,6 +1395,7 @@ static int ghes_probe(struct > platform_device > > > > *ghes_dev) > > > > platform_set_drvdata(ghes_dev, ghes); > > > > > > > > ghes->dev = &ghes_dev->dev; > > > > + set_ghes_devs_registered(false); > > > > > > This does not look right to me. > > > > > > The condition of using ghes_edac is (ghes-present) && (ghes- > preferred), > > > where: > > > - ghes-present is latched on ghes_probe() > > > - ghes-preferred is true if the platform-check passes. > > > > > > ghes_get_device() introduced in the previous patch works as the > > > ghes-preferred check. > > > > > > We cannot use ghes_edac registration as the ghes-present check in > this > > > scheme since it is deferred to module_init(). > > > > What is the logic for ghes-present check? In this patch, I assumed it > is equal to > > "ghes_edac devices have been registered". Seems it is not correct. > > Using (ghes_edac-registered) is a wrong check in this scheme > since ghes_edac registration is deferred. This check is fine in > the current scheme since ghes_edac gets registered before > any other chipset-specific edac drivers. > > > But should we consider the case as follows: > > What if sbridge_init () is invoked before ghes_edac_init()? i.e. > Should we get > > sb_edac driver selected when ghes_edac is not loaded yet (e.g. on > HPE)? > > No. The point is that ghes_edac driver needs to be selected, > "regardless of the module ordering", on a system with GHES > present & preferred. > > Note that this new scheme leads to the following condition > change: > - Current: (ghes-present) && (ghes-preferred) && (ghes_edac registered) > - New: (ghes-present) && (ghes-preferred) > > The option I suggested previously keeps the current condition, > but this new scheme does not for better modularity. > > What this means is that if ghes_edac is not enabled (but ghes > is enabled) on a system with GHES present & preferred, no edac > driver gets registered. This change is fine from my (HPE) perspective > and should be fine for other GHES systems. GHES systems have > chipset-specific edac driver in FW. OS-based chipset-specific edac > driver is not necessary and may lead to a conflict of chipset register > ownership. Currently, running with this on the kernel command line ghes.disable causes the ACPI ghes driver to quit early in acpi_ghes_init(): /* * This driver isn't really modular, however for the time being, * continuing to use module_param is the easiest way to remain * compatible with existing boot arg use cases. */ bool ghes_disable; module_param_named(disable, ghes_disable, bool, 0); which results in the skx_edac module assuming it should run: [ 33.628140] calling skx_init+0x0/0xe5a [skx_edac] @ 1444 [ 33.628531] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:36:0a.0 (INTERRUPT) [ 33.641432] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:36:0c.0 (INTERRUPT) [ 33.653256] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT) [ 33.665055] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT) [ 33.676801] initcall skx_init+0x0/0xe5a [skx_edac] returned 0 after 36343 usecs We might need to differentiate between the system ROM really not offering GHES vs. the ghes module not running.