Re: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Borislav Petkov wrote:
> On Thu, May 09, 2024 at 02:21:28PM -0700, Dan Williams wrote:
> > Recall that there are 461 usages of module_pci_driver() in the kernel.
> > Every one of those arranges for just registering a PCI driver when the
> > module is loaded regardless of whether any devices that driver cares
> > about are present.
> 
> Sorry, I read your text a bunch of times but I still have no clue what
> you're trying to tell me.

Because taking this proposal to its logical end of "if a simple check is
possible, why not do it in module_init()" has wide implications like the
module_pci_driver() example.

> All *I* am saying is since this is a new subsystem and the methods for
> detecting the scrub functionality are two - either an ACPI table or
> a GET_SUPPORTED_FEATURES command, then the init function of the
> subsystem:

No, at a minimum that's a layering violation. This is a generic library
facility that should not care if it is being called for a CXL device or
an ACPI device. It also causes functional issues, see below:

> +static int __init memory_scrub_control_init(void)
> +{
> +       return class_register(&scrub_class);
> +}
> +subsys_initcall(memory_scrub_control_init);
> 
> can check for those two things before initializing.
> 
> If there is no scrubbing functionality, then it can return an error and
> not load.
> 
> The same as when we don't load x86 drivers on the wrong vendor and so
> on.

I think it works for x86 drivers because the functionality in those
modules is wholly contained within that one module. This scrub module is
a service library for other modules.

> If the check is easy, why not do it?

It is functionally the wrong place to do the check. When module_init()
fails it causes not only the current module to be unloaded but any
dependent modules will also fail to load.

Let's take an example of the CXL driver wanting to register with this
scrub interface to support the capability that *might* be available on
some CXL devices. The cxl_pci.ko module, that houses cxl_pci_driver,
grows a call to scrub_device_register(). That scrub_device_register()
call is statically present in cxl_pci.ko so that when cxl_pci.ko loads
symbol resolution requires scrub.ko to load.

Neither of those modules (cxl_pci.ko or scrub.ko) load automatically.
Either udev loads cxl_pci.ko because it sees a device that matches
cxl_mem_pci_tbl, or the user manually insmods those modules because they
think they know better. No memory wasted unless the user explicitly asks
for memory to be wasted.

If no CXL devices in the system have scrub capabilities, great, then
scrub_device_register() will never be called.

Now, if memory_scrub_control_init() did its own awkward and redundant
CXL scan, and fails with "no CXL scrub capable devices found" it would
not only block scrub.ko from loading, but also cxl_pci.ko since
cxl_pci.ko needs to resolve that symbol to load.

All of that said, you are right that there is still a scenario where
memory is wasted. I.e. the case where a subsystem like CXL or ACPI wants
the runtime *option* of calling scrub_device_register(), but never does.
That will inflict the cost of registering a vestigial scrub_class. That
can be mitigated with another layer of module indirection where
cxl_pci_driver registers a cxl_scrub_device and then a cxl_scrub_driver
in its own module calls scrub_device_register() with the scrub core.

I would entertain that extra indirection long before I would entertain
memory_scrub_control_init() growing scrub device enumeration that
belongs to the *caller* of scrub_device_register().

> Make more sense?

It is a reasonable question, but all module libraries incur init costs
just by being linked by another module. You can walk /sys/class to see
how many other subsystems are registering class devices but never using
them.

I would not say "no" to a generic facility that patches out module
dependencies until the first call, just not sure the return on
investment would be worth it.

Lastly I think drivers based on ACPI tables are awkward. They really
need to have an ACPI device to attach so that typical automatic Linux
module loading machinery can be used. The fact this function is a
subsys_initcall() is a red-flag since nothing should be depending on the
load order of a little driver to control scrub parameters.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux