On 10/24/24 17:32, Dan Williams wrote:
Alejandro Lucero Palau wrote:
On 10/23/24 02:43, Dan Williams wrote:
When the CXL subsystem is built-in the module init order is determined
by Makefile order. That order violates expectations. The expectation is
that cxl_acpi and cxl_mem can race to attach and that if cxl_acpi wins
the race cxl_mem will find the enabled CXL root ports it needs and if
cxl_acpi loses the race it will retrigger cxl_mem to attach via
cxl_bus_rescan(). That only works if cxl_acpi can assume ports are
enabled immediately upon cxl_acpi_probe() return. That in turn can only
happen in the CONFIG_CXL_ACPI=y case if the cxl_port object appears
before the cxl_acpi object in the Makefile.
I'm having problems with understanding this. The acpi module is
initialised following the initcall levels, as defined by the code with
the subsys_initcall(cxl_acpi_init), and the cxl_mem module is not, so
AFAIK, there should not be any race there with the acpi module always
being initialised first. It I'm right, the problem should be another one
we do not know yet ...
This is a valid point, and I do think that cxl_port should also move to
subsys_initcall() for completeness.
However, the reason this Makefile change works, even though cxl_acpi
finishes init before cxl_port when both are built-in, is due to device
discovery order.
With the old Makefile order it is possible for cxl_mem to race
cxl_acpi_probe() in a way that defeats the cxl_bus_rescan() that is
there to resolve device discovery races.
OK. Then rephrasing the commit would help.
Apart from that:
Tested-by: Alejandro Lucero <alucerop@xxxxxxx>
Reviewed-by: Alejandro Lucero <alucerop@xxxxxxx>
Fix up the order to prevent initialization failures, and make sure that
cxl_port is built-in if cxl_acpi is also built-in.
... or forcing cxl_port to be built-in is enough. I wonder how, without
it, the cxl root ports can be there for cxl_mem ...
It does not need to be there for cxl_mem. It is ok for cxl_mem to load
and complete enumeration well before cxl_acpi ever arrives. As long as
cxl_bus_rescan() enables those devices after the fact then everything is
ok.
The problematic case being fixed is the opposite, i.e. that
cxl_bus_rescan() completes and never triggers again after cxl_mem has
failed to find the root ports.