Tossing this out as larger documentation of these steps for comment, not as a representation of what will show up in the talk. This is trying to cover the minimum needed information to start reasoning about the growing complexity of configurations. Platform / BIOS / EFI Configuraiton =================================== --------------------------------------- Step 1: BIOS-time hardware programming. --------------------------------------- I don't want to focus on platform specifics, so really all you need to know about this phase for the purpose of MM is that platforms may program the CXL device heirarchy and lock the configuration. In practice it means you probably can't reconfigure things after boot without doing major teardowns of the devices and resetting them - assuming the platform doesn't have major quirks that prevent this. This has implications for Hotplug, Interleave, and RAS, but we'll cover those explicitly elsewhere. Otherwise, if something gets mucked up at this stage - complain to your platform / hardware vendor. ------------------------------------------------------------------ Step 2: BIOS / EFI generates the CEDT (CXL Early Detection Table). ------------------------------------------------------------------ This table is responsible for reporting each "CXL Host Bridge" and "CXL Fixed Memory Window" present at boot - which enables early boot software to manage those devices and the memory capacity presented by those devices. Example CEDT Entries (truncated) Subtable Type : 00 [CXL Host Bridge Structure] Reserved : 00 Length : 0020 Associated host bridge : 00000005 Subtable Type : 01 [CXL Fixed Memory Window Structure] Reserved : 00 Length : 002C Reserved : 00000000 Window base address : 000000C050000000 Window size : 0000003CA0000000 If this memory is NOT marked "Special Purpose" by BIOS (next section), you should find a matching entry EFI Memory Map and /proc/iomem BIOS-e820: [mem 0x000000c050000000-0x000000fcefffffff] usable /proc/iomem: c050000000-fcefffffff : System RAM Observation: This memory is treated as 100% normal System RAM 1) This memory may be placed in any zone (ZONE_NORMAL, typically) 2) The kernel may use this memory for arbitrary allocations 4) The driver still enumerates CXL devices and memory regions, but 3) The CXL driver CANNOT manage this memory (as of today) (Caveat: *some* RAS features may still work, possibly) This creates an nuanced management state. The memory is online by default and completely usable, AND the driver appears to be managing the devices - BUT the memory resources and the management structure are fundamentally separate. 1) CXL Driver manages CXL features 2) Non-CXL SystemRAM mechanisms surface the memory to allocators. --------------------------------------------------------------- Step 3: EFI_MEMORY_SP - Deferring Management to the CXL Driver. --------------------------------------------------------------- Assuming you DON'T want CXL memory to default to SystemRAM and prefer NOT to have your kernel allocate arbitrary resources on CXL, you probably want to defer managing these memory regions to the CXL driver. The mechanism for is setting EFI_MEMORY_SP bit on CXL memory in BIOS. This will mark the memory "Special Purpose". Doing this will result in your memory being marked "Soft Reserved" on x86 and ARM (presently unknown on other architectures). You will see Memory Map and iomem entries like so: BIOS-e820: [mem 0x000000c050000000-0x000000fcefffffff] soft reserved /proc/iomem: c050000000-fcefffffff : Soft Reserved Unless of course: 1) CONFIG_EFI_SOFT_RESERVE=n in your build config, or 2) You set the nosoftreserve boot parameter 3) You kexec'd from a kernel where conditions #1 or #2 are met In which case you'll get SystemRAM as if EFI_MEMORY_SP was never set. (#3 was fun to debug, for some definition of fun. Ask me over coffee) ------------------------------------------------------------ First bit of nuanced complexity: Early-Boot Resource Re-use. ------------------------------------------------------------ How are MemoryMap resources managed by a driver after being reserved during early boot? Example: Hot-(un)plugging a device. What if we replace said Hot-unplugged device with a device with a new capacity? What if the arch/platform code combines two adjacent regions with similar attributes before creating resources? Recent work by Nathan Fontenot [1] has been looking to try to address some of the issues with these Soft Reserved resources and either re-using them or handing them off entirely to the relative driver for management. [1] https://lore.kernel.org/linux-cxl/cover.1737046620.git.nathan.fontenot@xxxxxxx/ -------------------------------------------------------------------- The Complexity story up til now (what's likely to show up in slides) -------------------------------------------------------------------- Platform and BIOS: May configure all the devices prior to kernel hand-off. May or may not support reconfiguring / hotplug. BIOS and EFI: EFI_MEMORY_SP - used to defer management to drivers Kernel Build and Boot: CONFIG_EFI_SOFT_RESERVE=n - Will always result in CXL as SystemRAM nosoftreserve - Will always result in CXL as SystemRAM kexec - SystemRAM configs carry over to target -------------------------------------------------------------------- Next Up: Driver Management - Decoders, HPA/SPA, DAX, and RAS. Memory (Block) Hotplug - Zones, Auto-Online, and User Policy. RAS - Poison, MCE, and why you probably want CXL=ZONE_MOVABLE. Interleave - RAS and Region Management (Hotplug-ability) ~Gregory