On 09.05.19 15:55, Wei Yang wrote: > On Tue, May 07, 2019 at 08:38:00PM +0200, David Hildenbrand wrote: >> Only memory to be added to the buddy and to be onlined/offlined by >> user space using memory block devices needs (and should have!) memory >> block devices. >> >> Factor out creation of memory block devices Create all devices after >> arch_add_memory() succeeded. We can later drop the want_memblock parameter, >> because it is now effectively stale. >> >> Only after memory block devices have been added, memory can be onlined >> by user space. This implies, that memory is not visible to user space at >> all before arch_add_memory() succeeded. >> >> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> >> Cc: "Rafael J. Wysocki" <rafael@xxxxxxxxxx> >> Cc: David Hildenbrand <david@xxxxxxxxxx> >> Cc: "mike.travis@xxxxxxx" <mike.travis@xxxxxxx> >> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >> Cc: Ingo Molnar <mingo@xxxxxxxxxx> >> Cc: Andrew Banman <andrew.banman@xxxxxxx> >> Cc: Oscar Salvador <osalvador@xxxxxxx> >> Cc: Michal Hocko <mhocko@xxxxxxxx> >> Cc: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> >> Cc: Qian Cai <cai@xxxxxx> >> Cc: Wei Yang <richard.weiyang@xxxxxxxxx> >> Cc: Arun KS <arunks@xxxxxxxxxxxxxx> >> Cc: Mathieu Malaterre <malat@xxxxxxxxxx> >> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> >> --- >> drivers/base/memory.c | 70 ++++++++++++++++++++++++++---------------- >> include/linux/memory.h | 2 +- >> mm/memory_hotplug.c | 15 ++++----- >> 3 files changed, 53 insertions(+), 34 deletions(-) >> >> diff --git a/drivers/base/memory.c b/drivers/base/memory.c >> index 6e0cb4fda179..862c202a18ca 100644 >> --- a/drivers/base/memory.c >> +++ b/drivers/base/memory.c >> @@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr) >> return 0; >> } >> >> +static void unregister_memory(struct memory_block *memory) >> +{ >> + BUG_ON(memory->dev.bus != &memory_subsys); >> + >> + /* drop the ref. we got via find_memory_block() */ >> + put_device(&memory->dev); >> + device_unregister(&memory->dev); >> +} >> + >> /* >> - * need an interface for the VM to add new memory regions, >> - * but without onlining it. >> + * Create memory block devices for the given memory area. Start and size >> + * have to be aligned to memory block granularity. Memory block devices >> + * will be initialized as offline. >> */ >> -int hotplug_memory_register(int nid, struct mem_section *section) >> +int hotplug_memory_register(unsigned long start, unsigned long size) >> { >> - int ret = 0; >> + unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT; >> + unsigned long start_pfn = PFN_DOWN(start); >> + unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT); >> + unsigned long pfn; >> struct memory_block *mem; >> + int ret = 0; >> >> - mutex_lock(&mem_sysfs_mutex); >> + BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes())); >> + BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes())); >> >> - mem = find_memory_block(section); >> - if (mem) { >> - mem->section_count++; >> - put_device(&mem->dev); >> - } else { >> - ret = init_memory_block(&mem, section, MEM_OFFLINE); >> + mutex_lock(&mem_sysfs_mutex); >> + for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) { >> + mem = find_memory_block(__pfn_to_section(pfn)); >> + if (mem) { >> + WARN_ON_ONCE(false); > > One question here, the purpose of WARN_ON_ONCE(false) is? Would we trigger > this? Would happen if something goes terribly wrong. We might want to remove this once we are sure this will not happen. I replaced it in the meantime by a if (WARN_ON_ONCE(mem)) { put_device(&mem->dev); ret = -EEXIST; break; } > >> + put_device(&mem->dev); >> + continue; >> + } >> + ret = init_memory_block(&mem, __pfn_to_section(pfn), >> + MEM_OFFLINE); >> if (ret) >> - goto out; >> - mem->section_count++; >> + break; >> + mem->section_count = memory_block_size_bytes() / >> + MIN_MEMORY_BLOCK_SIZE; > > Maybe we can leverage sections_per_block variable. Most certainly if it does what I think it does :) thanks! -- Thanks, David / dhildenb