On Wed, Aug 12, 2015 at 11:50:05PM -0400, Dan Williams wrote: > While pmem is usable as a block device or via DAX mappings to userspace > there are several usage scenarios that can not target pmem due to its > lack of struct page coverage. In preparation for "hot plugging" pmem > into the vmemmap add ZONE_DEVICE as a new zone to tag these pages > separately from the ones that are subject to standard page allocations. > Importantly "device memory" can be removed at will by userspace > unbinding the driver of the device. > > Having a separate zone prevents allocation and otherwise marks these > pages that are distinct from typical uniform memory. Device memory has > different lifetime and performance characteristics than RAM. However, > since we have run out of ZONES_SHIFT bits this functionality currently > depends on sacrificing ZONE_DMA. > > arch_add_memory() is reorganized a bit in preparation for a new > arch_add_dev_memory() api, for now there is no functional change to the > memory hotplug code. > > Cc: H. Peter Anvin <hpa@xxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > Cc: Rik van Riel <riel@xxxxxxxxxx> > Cc: Mel Gorman <mgorman@xxxxxxx> > Cc: linux-mm@xxxxxxxxx > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > --- > arch/x86/Kconfig | 13 +++++++++++++ > arch/x86/mm/init_64.c | 32 +++++++++++++++++++++----------- > include/linux/mmzone.h | 23 +++++++++++++++++++++++ > mm/memory_hotplug.c | 5 ++++- > mm/page_alloc.c | 3 +++ > 5 files changed, 64 insertions(+), 12 deletions(-) > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index b3a1a5d77d92..64829b17980b 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -308,6 +308,19 @@ config ZONE_DMA > > If unsure, say Y. > > +config ZONE_DEVICE > + bool "Device memory (pmem, etc...) hotplug support" if EXPERT > + default !ZONE_DMA > + depends on !ZONE_DMA > + help > + Device memory hotplug support allows for establishing pmem, > + or other device driver discovered memory regions, in the > + memmap. This allows pfn_to_page() lookups of otherwise > + "device-physical" addresses which is needed for using a DAX > + mapping in an O_DIRECT operation, among other things. > + > + If FS_DAX is enabled, then say Y. > + > config SMP > bool "Symmetric multi-processing support" > ---help--- > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c > index 3fba623e3ba5..94f0fa56f0ed 100644 > --- a/arch/x86/mm/init_64.c > +++ b/arch/x86/mm/init_64.c > @@ -683,15 +683,8 @@ static void update_end_of_memory_vars(u64 start, u64 size) > } > } > > -/* > - * Memory is added always to NORMAL zone. This means you will never get > - * additional DMA/DMA32 memory. > - */ > -int arch_add_memory(int nid, u64 start, u64 size) > +static int __arch_add_memory(int nid, u64 start, u64 size, struct zone *zone) > { > - struct pglist_data *pgdat = NODE_DATA(nid); > - struct zone *zone = pgdat->node_zones + > - zone_for_memory(nid, start, size, ZONE_NORMAL); > unsigned long start_pfn = start >> PAGE_SHIFT; > unsigned long nr_pages = size >> PAGE_SHIFT; > int ret; > @@ -701,11 +694,28 @@ int arch_add_memory(int nid, u64 start, u64 size) > ret = __add_pages(nid, zone, start_pfn, nr_pages); > WARN_ON_ONCE(ret); > > - /* update max_pfn, max_low_pfn and high_memory */ > - update_end_of_memory_vars(start, size); > + /* > + * Update max_pfn, max_low_pfn and high_memory, unless we added > + * "device memory" which should not effect max_pfn > + */ > + if (!is_dev_zone(zone)) > + update_end_of_memory_vars(start, size); What is the rational for not updating max_pfn, max_low_pfn, ... ? Cheers, Jérôme -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>