On Fri, Aug 14, 2015 at 02:52:15PM -0700, Dan Williams wrote: > On Fri, Aug 14, 2015 at 2:37 PM, Jerome Glisse <j.glisse@xxxxxxxxx> wrote: > > On Wed, Aug 12, 2015 at 11:50:05PM -0400, Dan Williams wrote: > >> While pmem is usable as a block device or via DAX mappings to userspace > >> there are several usage scenarios that can not target pmem due to its > >> lack of struct page coverage. In preparation for "hot plugging" pmem > >> into the vmemmap add ZONE_DEVICE as a new zone to tag these pages > >> separately from the ones that are subject to standard page allocations. > >> Importantly "device memory" can be removed at will by userspace > >> unbinding the driver of the device. > >> > >> Having a separate zone prevents allocation and otherwise marks these > >> pages that are distinct from typical uniform memory. Device memory has > >> different lifetime and performance characteristics than RAM. However, > >> since we have run out of ZONES_SHIFT bits this functionality currently > >> depends on sacrificing ZONE_DMA. > >> > >> arch_add_memory() is reorganized a bit in preparation for a new > >> arch_add_dev_memory() api, for now there is no functional change to the > >> memory hotplug code. > >> > >> Cc: H. Peter Anvin <hpa@xxxxxxxxx> > >> Cc: Ingo Molnar <mingo@xxxxxxxxxx> > >> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > >> Cc: Rik van Riel <riel@xxxxxxxxxx> > >> Cc: Mel Gorman <mgorman@xxxxxxx> > >> Cc: linux-mm@xxxxxxxxx > >> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > >> --- > >> arch/x86/Kconfig | 13 +++++++++++++ > >> arch/x86/mm/init_64.c | 32 +++++++++++++++++++++----------- > >> include/linux/mmzone.h | 23 +++++++++++++++++++++++ > >> mm/memory_hotplug.c | 5 ++++- > >> mm/page_alloc.c | 3 +++ > >> 5 files changed, 64 insertions(+), 12 deletions(-) > >> > >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > >> index b3a1a5d77d92..64829b17980b 100644 > >> --- a/arch/x86/Kconfig > >> +++ b/arch/x86/Kconfig > >> @@ -308,6 +308,19 @@ config ZONE_DMA > >> > >> If unsure, say Y. > >> > >> +config ZONE_DEVICE > >> + bool "Device memory (pmem, etc...) hotplug support" if EXPERT > >> + default !ZONE_DMA > >> + depends on !ZONE_DMA > >> + help > >> + Device memory hotplug support allows for establishing pmem, > >> + or other device driver discovered memory regions, in the > >> + memmap. This allows pfn_to_page() lookups of otherwise > >> + "device-physical" addresses which is needed for using a DAX > >> + mapping in an O_DIRECT operation, among other things. > >> + > >> + If FS_DAX is enabled, then say Y. > >> + > >> config SMP > >> bool "Symmetric multi-processing support" > >> ---help--- > >> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c > >> index 3fba623e3ba5..94f0fa56f0ed 100644 > >> --- a/arch/x86/mm/init_64.c > >> +++ b/arch/x86/mm/init_64.c > [..] > >> @@ -701,11 +694,28 @@ int arch_add_memory(int nid, u64 start, u64 size) > >> ret = __add_pages(nid, zone, start_pfn, nr_pages); > >> WARN_ON_ONCE(ret); > >> > >> - /* update max_pfn, max_low_pfn and high_memory */ > >> - update_end_of_memory_vars(start, size); > >> + /* > >> + * Update max_pfn, max_low_pfn and high_memory, unless we added > >> + * "device memory" which should not effect max_pfn > >> + */ > >> + if (!is_dev_zone(zone)) > >> + update_end_of_memory_vars(start, size); > > > > What is the rational for not updating max_pfn, max_low_pfn, ... ? > > > > The idea is that this memory is not meant to be available to the page > allocator and should not count as new memory capacity. We're only > hotplugging it to get struct page coverage. But this sounds bogus to me to rely on max_pfn to stay smaller than first_dev_pfn. For instance you might plug a device that register dev memory and then some regular memory might be hotplug, effectively updating max_pfn to a value bigger than first_dev_pfn. Also i do not think that the buddy allocator use max_pfn or max_low_pfn to consider page/zone for allocation or not. Cheers, Jérôme -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>