On 12/11/2024 4:30 PM, Dan Williams wrote: > Nathan Fontenot wrote: >> Update handling of SOFT RESERVE iomem resources that intersect with >> CXL region resources to remove the intersections from the SOFT RESERVE >> resources. The current approach of leaving the SOFT RESERVE >> resource as is can cause failures during hotplug replace of CXL >> devices because the resource is not available for reuse after >> teardown of the CXL device. >> >> The approach is to trim out any pieces of SOFT RESERVE resources >> that intersect CXL regions. To do this, first set aside any SOFT RESERVE >> resources that intersect with a CFMWS into a separate resource tree >> during e820__reserve_resources_late() that would have been otherwise >> added to the iomem resource tree. >> >> As CXL regions are created the cxl resource created for the new >> region is used to trim intersections from the SOFT RESERVE >> resources that were previously set aside. >> >> Once CXL device probe has completed ant remaining SOFT RESERVE resources >> remaining are added to the iomem resource tree. As each resource >> is added to the oiomem resource tree a new notifier chain is invoked >> to notify the dax driver of newly added SOFT RESERVE resources so that >> the dax driver can consume them. > > Hi Nathan, this patch hit on all the mechanisms I would expect, but upon > reading it there is an opportunity to zoom out and do something blunter > than the surgical precision of this current proposal. > > In other words, I appreciate the consideration of potential corner > cases, but for overall maintainability this should aim to be an all or > nothing approach. > > Specifically, at the first sign of trouble, any CXL sub-driver probe > failure or region enumeration timeout, that the entire CXL topology be > torn down (trigger the equivalent of ->remove() on the ACPI0017 device), > and the deferred Soft Reserved ranges registered as if cxl_acpi was not > present (implement a fallback equivalent to hmem_register_devices()). > > No need to trim resources as regions arrive, just tear down everything > setup in the cxl_acpi_probe() path with devres_release_all(). > > So, I am thinking export a flag from the CXL core that indicates whether > any conflict with platform-firmware established CXL regions has > occurred. > > Read that flag from an cxl_acpi-driver-launched deferred workqueue that > is awaiting initial device probing to quiesce. If that flag indicates a > CXL enumeration failure then trigger devres_release_all() on the > ACPI0017 platform device and follow that up by walking the deferred Soft > Reserve resources to register raw (unparented by CXL regions) dax > devices. >