On Mon, 2019-02-25 at 10:57 -0800, Dave Hansen wrote: > From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > > This is intended for use with NVDIMMs that are physically persistent > (physically like flash) so that they can be used as a cost-effective > RAM replacement. Intel Optane DC persistent memory is one > implementation of this kind of NVDIMM. > > Currently, a persistent memory region is "owned" by a device driver, > either the "Direct DAX" or "Filesystem DAX" drivers. These drivers > allow applications to explicitly use persistent memory, generally > by being modified to use special, new libraries. (DIMM-based > persistent memory hardware/software is described in great detail > here: Documentation/nvdimm/nvdimm.txt). > > However, this limits persistent memory use to applications which > *have* been modified. To make it more broadly usable, this driver > "hotplugs" memory into the kernel, to be managed and used just like > normal RAM would be. > > To make this work, management software must remove the device from > being controlled by the "Device DAX" infrastructure: > > echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind > > and then tell the new driver that it can bind to the device: > > echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id > > After this, there will be a number of new memory sections visible > in sysfs that can be onlined, or that may get onlined by existing > udev-initiated memory hotplug rules. > > This rebinding procedure is currently a one-way trip. Once memory > is bound to "kmem", it's there permanently and can not be > unbound and assigned back to device_dax. > > The kmem driver will never bind to a dax device unless the device > is *explicitly* bound to the driver. There are two reasons for > this: One, since it is a one-way trip, it can not be undone if > bound incorrectly. Two, the kmem driver destroys data on the > device. Think of if you had good data on a pmem device. It > would be catastrophic if you compile-in "kmem", but leave out > the "device_dax" driver. kmem would take over the device and > write volatile data all over your good data. > > This inherits any existing NUMA information for the newly-added > memory from the persistent memory device that came from the > firmware. On Intel platforms, the firmware has guarantees that > require each socket's persistent memory to be in a separate > memory-only NUMA node. That means that this patch is not expected > to create NUMA nodes, but will simply hotplug memory into existing > nodes. > > Because NUMA nodes are created, the existing NUMA APIs and tools > are sufficient to create policies for applications or memory areas > to have affinity for or an aversion to using this memory. > > There is currently some metadata at the beginning of pmem regions. > The section-size memory hotplug restrictions, plus this small > reserved area can cause the "loss" of a section or two of capacity. > This should be fixable in follow-on patches. But, as a first step, > losing 256MB of memory (worst case) out of hundreds of gigabytes > is a good tradeoff vs. the required code to fix this up precisely. > This calculation is also the reason we export > memory_block_size_bytes(). > > Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx> > Reviewed-by: Keith Busch <keith.busch@xxxxxxxxx> > Cc: Dave Jiang <dave.jiang@xxxxxxxxx> > Cc: Ross Zwisler <zwisler@xxxxxxxxxx> > Cc: Vishal Verma <vishal.l.verma@xxxxxxxxx> > Cc: Tom Lendacky <thomas.lendacky@xxxxxxx> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxx> > Cc: linux-nvdimm@xxxxxxxxxxxx > Cc: linux-kernel@xxxxxxxxxxxxxxx > Cc: linux-mm@xxxxxxxxx > Cc: Huang Ying <ying.huang@xxxxxxxxx> > Cc: Fengguang Wu <fengguang.wu@xxxxxxxxx> > Cc: Borislav Petkov <bp@xxxxxxx> > Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > Cc: Yaowei Bai <baiyaowei@xxxxxxxxxxxxxxxxxxxx> > Cc: Takashi Iwai <tiwai@xxxxxxx> > Cc: Jerome Glisse <jglisse@xxxxxxxxxx> > --- > > b/drivers/base/memory.c | 1 > b/drivers/dax/Kconfig | 16 +++++++ > b/drivers/dax/Makefile | 1 > b/drivers/dax/kmem.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 126 insertions(+) Looks good, Reviewed-by: Vishal Verma <vishal.l.verma@xxxxxxxxx>