Alejandro Lucero Palau wrote: > > On 10/23/24 02:43, Dan Williams wrote: > > When the CXL subsystem is built-in the module init order is determined > > by Makefile order. That order violates expectations. The expectation is > > that cxl_acpi and cxl_mem can race to attach and that if cxl_acpi wins > > the race cxl_mem will find the enabled CXL root ports it needs and if > > cxl_acpi loses the race it will retrigger cxl_mem to attach via > > cxl_bus_rescan(). That only works if cxl_acpi can assume ports are > > enabled immediately upon cxl_acpi_probe() return. That in turn can only > > happen in the CONFIG_CXL_ACPI=y case if the cxl_port object appears > > before the cxl_acpi object in the Makefile. > > > I'm having problems with understanding this. The acpi module is > initialised following the initcall levels, as defined by the code with > the subsys_initcall(cxl_acpi_init), and the cxl_mem module is not, so > AFAIK, there should not be any race there with the acpi module always > being initialised first. It I'm right, the problem should be another one > we do not know yet ... This is a valid point, and I do think that cxl_port should also move to subsys_initcall() for completeness. However, the reason this Makefile change works, even though cxl_acpi finishes init before cxl_port when both are built-in, is due to device discovery order. With the old Makefile order it is possible for cxl_mem to race cxl_acpi_probe() in a way that defeats the cxl_bus_rescan() that is there to resolve device discovery races. > > Fix up the order to prevent initialization failures, and make sure that > > cxl_port is built-in if cxl_acpi is also built-in. > > ... or forcing cxl_port to be built-in is enough. I wonder how, without > it, the cxl root ports can be there for cxl_mem ... It does not need to be there for cxl_mem. It is ok for cxl_mem to load and complete enumeration well before cxl_acpi ever arrives. As long as cxl_bus_rescan() enables those devices after the fact then everything is ok. The problematic case being fixed is the opposite, i.e. that cxl_bus_rescan() completes and never triggers again after cxl_mem has failed to find the root ports.