On 14.01.20 12:09, Anshuman Khandual wrote: > > > On 01/14/2020 07:43 AM, Anshuman Khandual wrote: >> >> >> On 01/13/2020 04:07 PM, David Hildenbrand wrote: >>> On 13.01.20 10:50, Anshuman Khandual wrote: >>>> >>>> >>>> On 01/13/2020 02:44 PM, David Hildenbrand wrote: >>>>> >>>>> >>>>>> Am 13.01.2020 um 10:10 schrieb Anshuman Khandual <anshuman.khandual@xxxxxxx>: >>>>>> >>>>>> >>>>>> >>>>>>> On 01/10/2020 02:12 PM, David Hildenbrand wrote: >>>>>>>> On 10.01.20 04:09, Anshuman Khandual wrote: >>>>>>>> Currently there are two interfaces to initiate memory range hot removal i.e >>>>>>>> remove_memory() and __remove_memory() which then calls try_remove_memory(). >>>>>>>> Platform gets called with arch_remove_memory() to tear down required kernel >>>>>>>> page tables and other arch specific procedures. But there are platforms >>>>>>>> like arm64 which might want to prevent removal of certain specific memory >>>>>>>> ranges irrespective of their present usage or movability properties. >>>>>>> >>>>>>> Why? Is this only relevant for boot memory? I hope so, otherwise the >>>>>>> arch code needs fixing IMHO. >>>>>> >>>>>> Right, it is relevant only for the boot memory on arm64 platform. But this >>>>>> new arch callback makes it flexible to reject any given memory range. >>>>>> >>>>>>> >>>>>>> If it's only boot memory, we should disallow offlining instead via a >>>>>>> memory notifier - much cleaner. >>>>>> >>>>>> Dont have much detail understanding of MMU notifier mechanism but from some >>>>>> initial reading, it seems like we need to have a mm_struct for a notifier >>>>>> to monitor various events on the page table. Just wondering how a physical >>>>>> memory range like boot memory can be monitored because it can be used both >>>>>> for for kernel (init_mm) or user space process at same time. Is there some >>>>>> mechanism we could do this ? >>>>>> >>>>>>> >>>>>>>> >>>>>>>> Current arch call back arch_remove_memory() is too late in the process to >>>>>>>> abort memory hot removal as memory block devices and firmware memory map >>>>>>>> entries would have already been removed. Platforms should be able to abort >>>>>>>> the process before taking the mem_hotplug_lock with mem_hotplug_begin(). >>>>>>>> This essentially requires a new arch callback for memory range validation. >>>>>>> >>>>>>> I somewhat dislike this very much. Memory removal should never fail if >>>>>>> used sanely. See e.g., __remove_memory(), it will BUG() whenever >>>>>>> something like that would strike. >>>>>>> >>>>>>>> >>>>>>>> This differentiates memory range validation between memory hot add and hot >>>>>>>> remove paths before carving out a new helper check_hotremove_memory_range() >>>>>>>> which incorporates a new arch callback. This call back provides platforms >>>>>>>> an opportunity to refuse memory removal at the very onset. In future the >>>>>>>> same principle can be extended for memory hot add path if required. >>>>>>>> >>>>>>>> Platforms can choose to override this callback in order to reject specific >>>>>>>> memory ranges from removal or can just fallback to a default implementation >>>>>>>> which allows removal of all memory ranges. >>>>>>> >>>>>>> I suspect we want really want to disallow offlining instead. E.g., I >>>>>> >>>>>> If boot memory pages can be prevented from being offlined for sure, then it >>>>>> would indirectly definitely prevent hot remove process as well. >>>>>> >>>>>>> remember s390x does that with certain areas needed for dumping/kexec. >>>>>> >>>>>> Could not find any references to mmu_notifier in arch/s390 or any other arch >>>>>> for that matter apart from KVM (which has an user space component), could you >>>>>> please give some pointers ? >>>>> >>>>> Memory (hotplug) notifier, not MMU notifier :) >>>> >>>> They are so similarly named :) >>>> >>>>> >>>>> Not on my notebook right now, grep for MEM_GOING_OFFLINE, that should be it. >>>>> >>>> >>>> Got it, thanks ! But we will still need boot memory enumeration via MEMBLOCK_BOOT >>>> to reject affected offline requests in the callback. >>> >>> Do you really need that? >>> >>> We have SECTION_IS_EARLY. You could iterate all involved sections (for >>> which you are getting notified) and check if any one of these is marked >>> SECTION_IS_EARLY. then, it was added during boot and not via add_memory(). >> >> Seems to be a better approach than adding a new memblock flag. > > These additional changes do the trick and prevent boot memory removal. > Hope this is in line with your earlier suggestion. > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > index 00f3e1836558..3b59e6a29dea 100644 > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -17,6 +17,7 @@ > +++ b/arch/arm64/mm/mmu.c > @@ -17,6 +17,7 @@ > #include <linux/mman.h> > #include <linux/nodemask.h> > #include <linux/memblock.h> > +#include <linux/memory.h> > #include <linux/fs.h> > #include <linux/io.h> > #include <linux/mm.h> > @@ -1365,4 +1366,37 @@ void arch_remove_memory(int nid, u64 start, u64 size, > __remove_pages(start_pfn, nr_pages, altmap); > __remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size); > } > + > +static int boot_mem_remove_notifier(struct notifier_block *nb, > + unsigned long action, void *data) > +{ > + unsigned long start_pfn, end_pfn, pfn, section_nr; > + struct mem_section *ms; > + struct memory_notify *arg = data; > + > + start_pfn = > + end_pfn = start_pfn + arg->nr_pages; You can initialize some of these directly struct memory_notify *arg = data; const unsigned long end_pfn = arg->start_pfn; + arg->nr_pages; unsigned long pfn = arg->start_pfn; and avoid start_pfn. > + > + if (action != MEM_GOING_OFFLINE) > + return NOTIFY_OK; > + > + for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) { > + section_nr = ; > + ms = __nr_to_section(section_nr); Also, I think you can avoid section_nr. ms = __nr_to_section(pfn_to_section_nr(pfn)); > + > + if (early_section(ms)) > + return NOTIFY_BAD; > + } > + return NOTIFY_OK; > +} > + > +static struct notifier_block boot_mem_remove_nb = { > + .notifier_call = boot_mem_remove_notifier, > +}; > + > +static int __init boot_mem_remove_init(void) > +{ > + return register_memory_notifier(&boot_mem_remove_nb); > +} > +device_initcall(boot_mem_remove_init); > #endif Exactly what I was suggesting :) If we ever need to offline+re-online boot memory (e.g., to a different zone), we can think of something else. -- Thanks, David / dhildenb