Hi Dan, How about let the BIOS report a new type for kmem in e820 table? e.g. #define E820_PMEM 7 #define E820_KMEM 8 Then pmem and kmem are separately, and we can easily hotadd kmem to the memory subsystem, no disturb the existing code (e.g. pmem, nvdimm, dax...). I don't know whether Intel will change some hardware features for pmem which used like a volatility memory in the future. Perhaps faster than pmem, cheaper, but volatility, and no need to care about atomicity, consistency, L2/L3 cache... Another question, why call it kmem? what does the "k" mean? Thanks, Xishi Qiu On 2018/10/23 09:11, Dan Williams wrote: > On Mon, Oct 22, 2018 at 6:05 PM Dan Williams <dan.j.williams@xxxxxxxxx> wrote: >> >> On Mon, Oct 22, 2018 at 1:18 PM Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> wrote: >>> >>> Persistent memory is cool. But, currently, you have to rewrite >>> your applications to use it. Wouldn't it be cool if you could >>> just have it show up in your system like normal RAM and get to >>> it like a slow blob of memory? Well... have I got the patch >>> series for you! >>> >>> This series adds a new "driver" to which pmem devices can be >>> attached. Once attached, the memory "owned" by the device is >>> hot-added to the kernel and managed like any other memory. On >>> systems with an HMAT (a new ACPI table), each socket (roughly) >>> will have a separate NUMA node for its persistent memory so >>> this newly-added memory can be selected by its unique NUMA >>> node. >>> >>> This is highly RFC, and I really want the feedback from the >>> nvdimm/pmem folks about whether this is a viable long-term >>> perversion of their code and device mode. It's insufficiently >>> documented and probably not bisectable either. >>> >>> Todo: >>> 1. The device re-binding hacks are ham-fisted at best. We >>> need a better way of doing this, especially so the kmem >>> driver does not get in the way of normal pmem devices. >>> 2. When the device has no proper node, we default it to >>> NUMA node 0. Is that OK? >>> 3. We muck with the 'struct resource' code quite a bit. It >>> definitely needs a once-over from folks more familiar >>> with it than I. >>> 4. Is there a better way to do this than starting with a >>> copy of pmem.c? >> >> So I don't think we want to do patch 2, 3, or 5. Just jump to patch 7 >> and remove all the devm_memremap_pages() infrastructure and dax_region >> infrastructure. >> >> The driver should be a dead simple turn around to call add_memory() >> for the passed in range. The hard part is, as you say, arranging for >> the kmem driver to not stand in the way of typical range / device >> claims by the dax_pmem device. >> >> To me this looks like teaching the nvdimm-bus and this dax_kmem driver >> to require explicit matching based on 'id'. The attachment scheme >> would look like this: >> >> modprobe dax_kmem >> echo dax0.0 > /sys/bus/nd/drivers/dax_kmem/new_id >> echo dax0.0 > /sys/bus/nd/drivers/dax_pmem/unbind >> echo dax0.0 > /sys/bus/nd/drivers/dax_kmem/bind >> >> At step1 the dax_kmem drivers will match no devices and stays out of >> the way of dax_pmem. It learns about devices it cares about by being >> explicitly told about them. Then unbind from the typical dax_pmem >> driver and attach to dax_kmem to perform the one way hotplug. >> >> I expect udev can automate this by setting up a rule to watch for >> device-dax instances by UUID and call a script to do the detach / >> reattach dance. > > The next question is how to support this for ranges that don't > originate from the pmem sub-system. I expect we want dax_kmem to > register a generic platform device representing the range and have a > generic platofrm driver that turns around and does the add_memory(). >