Re: [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 24 Oct 2024 09:19:17 -0700
Dan Williams <dan.j.williams@xxxxxxxxx> wrote:

> Jonathan Cameron wrote:
> > On Tue, 22 Oct 2024 18:43:24 -0700
> > Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> >   
> > > When the CXL subsystem is built-in the module init order is determined
> > > by Makefile order. That order violates expectations. The expectation is
> > > that cxl_acpi and cxl_mem can race to attach and that if cxl_acpi wins
> > > the race cxl_mem will find the enabled CXL root ports it needs and if
> > > cxl_acpi loses the race it will retrigger cxl_mem to attach via
> > > cxl_bus_rescan(). That only works if cxl_acpi can assume ports are
> > > enabled immediately upon cxl_acpi_probe() return. That in turn can only
> > > happen in the CONFIG_CXL_ACPI=y case if the cxl_port object appears
> > > before the cxl_acpi object in the Makefile.
> > > 
> > > Fix up the order to prevent initialization failures, and make sure that
> > > cxl_port is built-in if cxl_acpi is also built-in.
> > > 
> > > As for what contributed to this not being found earlier, the CXL
> > > regression environment, cxl_test, builds all CXL functionality as a
> > > module to allow to symbol mocking and other dynamic reload tests.  As a
> > > result there is no regression coverage for the built-in case.
> > > 
> > > Reported-by: Gregory Price <gourry@xxxxxxxxxx>
> > > Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@xxxxxxxxxx
> > > Tested-by: Gregory Price <gourry@xxxxxxxxxx>
> > > Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")  
> > 
> > I don't like this due to likely long term fragility, but any other  
> 
> Please be specific about the fragility and how is this different than
> any other Makefile order fragility, like the many cases in
> drivers/Makefile/, or patches fixing up initcall order?

Sure, I don't like any of them ;) Mostly was having a grumpy day
rather than suggesting a change in this.

> 
> Now, an argument can be made that there are too many CXL sub-objects and
> more can be merged into a monolithic cxl_core object. The flipside of
> that is reduced testability, at least via symbol mocking techniques.
> Just look at the recent case where the fact that
> drivers/cxl/core/region.c is built into cxl_core.o rather than its own
> cxl_region.o object results in an in-line code change to support
> cxl_test [1]. There are tradeoffs.
Absolutely agree. 
> 
> > solution is probably more painful.  Long term we should really get
> > a regression test for these ordering issues in place in one of
> > the CIs.  
> 
> The final patch in this series does improve cxl_test's ability to catch
> regressions in module init order, and that ordering change did uncover a
> bug. The system works! 😅
That was indeed good to see!
> 
> Going further the test mode that is needed, in addition to QEMU
> emulation and cxl_test interface mocking, is kunit or similar [2]
> infrastructure for some function-scope unit tests.
> 
> [1]: http://lore.kernel.org/20241022030054.258942-1-lizhijian@xxxxxxxxxxx
> [2]: http://lore.kernel.org/170795677066.3697776.12587812713093908173.stgit@ubuntu
> 
Yeah, we have a long way to go on testing (common problem!)

Jonathan






[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux