On Thu, Dec 12, 2024 at 05:01:53PM -0800, Alison Schofield wrote: > BIOS labels a resource Soft Reserved and programs a region using > that range. Later, the existing cxl path to destroy that region > does not free up that Soft Reserved range. Users cannot create > another region in it's place. Resource lost. We considered simply > removing soft reserved resources on region teardown, and you can > probably find a patches on lore doing just that. > > But - the problem grew. Sometimes BIOS creates an SR that is not > aligned with the region they go on to program. Stranded resources. > That's where the trim and give to DAX path originated. > > But - the problem grew. Sometimes the CXL driver fails to enumerate > that BIOS defined region. More stranded resources. Let's find those > too and give them to DAX. This is something we are seeing in the > wild now and why Dan raised its priority. > Hm, this makes me concerned for what happens on "full hotplug" (literal physical removal/addition) of CXL devices - kind of like we've seen proposed with E3.S form factor devices from a variety of vendors. Like what happens in the following scenario (rhetorical question, I want to test this with QEMU - but i'm on a plane right now and want to get the experiment process down). Boot: No CXL device is present Post-boot: CXL device is physically hot-plugged - there won't be a resource registered, so I would presume the ACPI / EFI / CXL drivers would register one. Event 1: CXL device is shutdown and removed - Is the resource deleted? I would presume yes. - Is this true if the CXL device *was* present at boot time? If i'm following correctly ^ this is the present scenario? Lets assume the device was present at boot, and the resource is not deleted. Now we have a "stale resource"? Event 2A: A new CXL device is added - Possibility 1: Same capacity - resource is reused? - Possibility 2: Lower capacity - resource is chopped up? - Possibility 3: Higher capacity - resource is... lost forever? Fails to map? ??? Event 2B: A new CXL device is added on a different PCI dev id, then Event 2A occurs. - Is the "stale resource" reused here, or is a new one created? I hadn't really considered the impact of hotplug on the iomem resource blocks (soft) reserved at boot, but this is concerning. I remember ~1.5 years ago I was prototyping with hotplug behavior in QEMU and saw that it was possible to do runtime ACPI/PCI add/remove of CXL devices - this worked. But I didn't look at the effects on iomem resources - now i'm wondering what happens if I try to hot-unplug a CXL device that was present at boot. This won't affect me for the immediate future, but if we're mucking around in this space, might as well ask the question. I presume we'll find even worse corner cases here :D :| :[ :< I do know servers with front-facing E3.S CXL devices intended for hot-replace exist and are a real use-case. I have no idea how that is supposed to work the presence of stale iomem resources. > Dan is also suggesting that at that last event - failure to enumerate > a BIOS defined region, we tear down the entire ACPI0017 toplogy > and give everything to DAX. > > What Dan called, "the minimum requirement": all Soft Reserved ranges > end up as dax-devices sounds like the right guideline moving forward. > I guess devils in the details here. I sense an implication that it's possible for two distinct pieces of SR-providing hardware (HBM and CXL) could end up concatonated into a single SR range? That would obviously necessitate the need for chopping up an SR. So this all makes sense. But I don't disagree with the need for this, just concerned that we have CXL-specific logic landing in mm/ and e820 code. ~Gregory