On Tue, Jun 18, 2024 at 12:32:33PM -0700, Dan Williams wrote: > It strikes me that playing these initcall games is a losing battle and > that this case would be best served by late loading of NPEM > functionality. > > Something similar is happening with PCI device security where the > enabling depends on a third-party driver for a platform > "security-manager" (TSM) to arrive. > > The approach there is to make the functionality independent of > device-discovery vs TSM driver load order. So, if the TSM driver is > loaded early then pci_init_capabilities() can immediately enable the > functionality. If the TSM driver is loaded *after* some devices have already > gone through pci_init_capabilities(), then that event is responsible for > doing for_each_pci_dev() to catch up on devices that missed their > initial chance to turn on device security details. > > So, for NPEM, the thought would be to implement the same rendezvous > flow, i.e. s/TSM/NPEM/. A different viewpoint is that these issues are caused by the "division of labor" between OS kernel and platform firmware. In the NPEM case, Dell servers require the OS to call firmware to change LEDs. But before OS can do that, OS has to initialize a certain other interface with firmware. In the TSM case, Intel TDX Connect or AMD SEV-TIO require OS to ask firmware to perform certain authentication steps with devices, wherefore OS has to provide another interface to facilitate communication with the device. It's a complexity nightmare exacerbated by vendor-specific quirks. Which is why I'm arguing that firmware functionality (e.g. TDX module) should be constrained to the absolute minimum and the OS should be in control of as much as possible. That's the approach Apple has been following as it's the only way to achieve their close interplay between hardware and software without making things too complex. It seems what's keeping this series from working on Dell servers is primarily that the driver wants to read out LED status on probe. So I've recommended to Mariusz off-list to do that lazily if possible, i.e. on first read of a LED's status. Then if users do try to read or write LED status on Dell servers without loading IPMI modules first, they get to keep the pieces, sorry. :( > I am an overdue for a refresh of the TSM patches No hurry, there's a refresh of the OS-owned PCI device authentication coming up before end of this month. I'm taking my "TDX Connect heretic" hat off now. :) Thanks, Lukas