On Sun, Feb 05, 2023 at 05:04:05PM -0800, Dan Williams wrote: > The default mode for device-dax instances is backwards for RAM-regions > as evidenced by the fact that it tends to catch end users by surprise. > "Where is my memory?". Recall that platforms are increasingly shipping > with performance-differentiated memory pools beyond typical DRAM and > NUMA effects. This includes HBM (high-bandwidth-memory) and CXL (dynamic > interleave, varied media types, and future fabric attached > possibilities). > > For this reason the EFI_MEMORY_SP (EFI Special Purpose Memory => Linux > 'Soft Reserved') attribute is expected to be applied to all memory-pools > that are not the general purpose pool. This designation gives an > Operating System a chance to defer usage of a memory pool until later in > the boot process where its performance properties can be interrogated > and administrator policy can be applied. > > 'Soft Reserved' memory can be anything from too limited and precious to > be part of the general purpose pool (HBM), too slow to host hot kernel > data structures (some PMEM media), or anything in between. However, in > the absence of an explicit policy, the memory should at least be made > usable by default. The current device-dax default hides all > non-general-purpose memory behind a device interface. > > The expectation is that the distribution of users that want the memory > online by default vs device-dedicated-access by default follows the > Pareto principle. A small number of enlightened users may want to do > userspace memory management through a device, but general users just > want the kernel to make the memory available with an option to get more > advanced later. > > Arrange for all device-dax instances not backed by PMEM to default to > attaching to the dax_kmem driver. From there the baseline memory hotplug > policy (CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE / memhp_default_state=) > gates whether the memory comes online or stays offline. Where, if it > stays offline, it can be reliably converted back to device-mode where it > can be partitioned, or fronted by a userspace allocator. > > So, if someone wants device-dax instances for their 'Soft Reserved' > memory: > > 1/ Build a kernel with CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n or boot > with memhp_default_state=offline, or roll the dice and hope that the > kernel has not pinned a page in that memory before step 2. > > 2/ Write a udev rule to convert the target dax device(s) from > 'system-ram' mode to 'devdax' mode: > > daxctl reconfigure-device $dax -m devdax -f > > Cc: Michal Hocko <mhocko@xxxxxxxx> > Cc: David Hildenbrand <david@xxxxxxxxxx> > Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> Stupid question: when defaulting to online, do these devices get placed into Zone Normal? Is there a way for us, at a minimum, to online this as Zone Moveable in an effort to assist the "hope the kernel has not pinned a page" problem (and to try to keep kernel resources out of this zone in general). If this is covered by a different patch or already set up this way, ignore me :] ~Gregory