On Wed, 19 Jun 2024 11:08:26 +0200 Lukas Wunner <lukas@xxxxxxxxx> wrote: > On Tue, Jun 18, 2024 at 12:32:33PM -0700, Dan Williams wrote: > > It strikes me that playing these initcall games is a losing battle and > > that this case would be best served by late loading of NPEM > > functionality. > > > > Something similar is happening with PCI device security where the > > enabling depends on a third-party driver for a platform > > "security-manager" (TSM) to arrive. > > > > The approach there is to make the functionality independent of > > device-discovery vs TSM driver load order. So, if the TSM driver is > > loaded early then pci_init_capabilities() can immediately enable the > > functionality. If the TSM driver is loaded *after* some devices have already > > gone through pci_init_capabilities(), then that event is responsible for > > doing for_each_pci_dev() to catch up on devices that missed their > > initial chance to turn on device security details. > > > > So, for NPEM, the thought would be to implement the same rendezvous > > flow, i.e. s/TSM/NPEM/. > > A different viewpoint is that these issues are caused by the > "division of labor" between OS kernel and platform firmware. > > In the NPEM case, Dell servers require the OS to call firmware > to change LEDs. But before OS can do that, OS has to initialize > a certain other interface with firmware. > > In the TSM case, Intel TDX Connect or AMD SEV-TIO require OS to > ask firmware to perform certain authentication steps with devices, > wherefore OS has to provide another interface to facilitate > communication with the device. > > It's a complexity nightmare exacerbated by vendor-specific quirks. > > Which is why I'm arguing that firmware functionality (e.g. TDX module) > should be constrained to the absolute minimum and the OS should be > in control of as much as possible. That's the approach Apple has > been following as it's the only way to achieve their close interplay > between hardware and software without making things too complex. > > It seems what's keeping this series from working on Dell servers is > primarily that the driver wants to read out LED status on probe. > So I've recommended to Mariusz off-list to do that lazily if possible, > i.e. on first read of a LED's status. > > Then if users do try to read or write LED status on Dell servers without > loading IPMI modules first, they get to keep the pieces, sorry. :( > Initially, I thought that Dan suggestion is the best option but after taking into account use cases of the driver and times provided by Stuart - lazy loading wins. As a led application maintainer, I can accept fact that I cannot impose led for a while and errors will be reported, that is fine. I can left a hint why it is happening to user. I would be a nightmare to get new LED controller after some time if LED interface appearance is delayed. It is much worse from user perspective because no device means that I have no information in userland. I cannot determine if something is going to be up soon so I will report disks as not supported - unnecessary maintenance hell. I may receive a lot of issues. Stuart, please give me some time to apply suggestions and introduce lazy approach. I'm working on it! Thanks, Mariusz