On Thu, 24 Oct 2024 09:19:17 -0700 Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > Jonathan Cameron wrote: > > On Tue, 22 Oct 2024 18:43:24 -0700 > > Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > > > When the CXL subsystem is built-in the module init order is determined > > > by Makefile order. That order violates expectations. The expectation is > > > that cxl_acpi and cxl_mem can race to attach and that if cxl_acpi wins > > > the race cxl_mem will find the enabled CXL root ports it needs and if > > > cxl_acpi loses the race it will retrigger cxl_mem to attach via > > > cxl_bus_rescan(). That only works if cxl_acpi can assume ports are > > > enabled immediately upon cxl_acpi_probe() return. That in turn can only > > > happen in the CONFIG_CXL_ACPI=y case if the cxl_port object appears > > > before the cxl_acpi object in the Makefile. > > > > > > Fix up the order to prevent initialization failures, and make sure that > > > cxl_port is built-in if cxl_acpi is also built-in. > > > > > > As for what contributed to this not being found earlier, the CXL > > > regression environment, cxl_test, builds all CXL functionality as a > > > module to allow to symbol mocking and other dynamic reload tests. As a > > > result there is no regression coverage for the built-in case. > > > > > > Reported-by: Gregory Price <gourry@xxxxxxxxxx> > > > Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@xxxxxxxxxx > > > Tested-by: Gregory Price <gourry@xxxxxxxxxx> > > > Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") > > > > I don't like this due to likely long term fragility, but any other > > Please be specific about the fragility and how is this different than > any other Makefile order fragility, like the many cases in > drivers/Makefile/, or patches fixing up initcall order? Sure, I don't like any of them ;) Mostly was having a grumpy day rather than suggesting a change in this. > > Now, an argument can be made that there are too many CXL sub-objects and > more can be merged into a monolithic cxl_core object. The flipside of > that is reduced testability, at least via symbol mocking techniques. > Just look at the recent case where the fact that > drivers/cxl/core/region.c is built into cxl_core.o rather than its own > cxl_region.o object results in an in-line code change to support > cxl_test [1]. There are tradeoffs. Absolutely agree. > > > solution is probably more painful. Long term we should really get > > a regression test for these ordering issues in place in one of > > the CIs. > > The final patch in this series does improve cxl_test's ability to catch > regressions in module init order, and that ordering change did uncover a > bug. The system works! 😅 That was indeed good to see! > > Going further the test mode that is needed, in addition to QEMU > emulation and cxl_test interface mocking, is kunit or similar [2] > infrastructure for some function-scope unit tests. > > [1]: http://lore.kernel.org/20241022030054.258942-1-lizhijian@xxxxxxxxxxx > [2]: http://lore.kernel.org/170795677066.3697776.12587812713093908173.stgit@ubuntu > Yeah, we have a long way to go on testing (common problem!) Jonathan