On Fri, Jun 7, 2019 at 12:57 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > > On 6/7/19 12:27 PM, Dan Williams wrote: > > In support of optionally allowing either application-exclusive and > > core-kernel-mm managed access to differentiated memory, claim > > EFI_MEMORY_SP ranges for exposure as device-dax instances by default. > > Such instances can be directly owned / mapped by a > > platform-topology-aware application. Alternatively, with the new kmem > > facility [4], the administrator has the option to instead designate that > > those memory ranges be hot-added to the core-kernel-mm as a unique > > memory numa-node. In short, allow for the decision about what software > > agent manages specific-purpose memory to be made at runtime. > > It's probably worth noting that the reason the memory lands into the > state of being controlled by device-dax by default is that device-dax is > nice. It's actually willing and able to give up ownership of the memory > when we ask. If we added to the core-mm, we'd almost certainly not be > able to get it back reliably. > > Anyway, thanks for doing these, and I really hope that the world's > BIOSes actually use this flag. It should be noted that the flag is necessary, but not sufficient to route this memory range to device-dax. The BIOS must also publish ACPI HMAT performance data for the range so the OS has a chance of knowing *why* the memory is "reserved for a specific purpose", and delineate the boundaries of multiple performance differentiated memory ranges that might be combined into one shared / contiguous EFI memory descriptor. With no HMAT the memory will be reserved, but no dax-device will be surfaced. Perhaps this implementation also needs a WARN_TAINT(..., TAINT_FIRMWARE_WORKAROUND...) to scream about a BIOS that fails to publish the required HMAT entries, or perhaps even better a command line option to ignore the flag so that the core-mm can pick up the memory by default?