Hi Anshuman, Thanks for the details on the need for removing the page tables and vmemmap backing. Some comments on the code below. On Tue, Sep 03, 2019 at 03:15:58PM +0530, Anshuman Khandual wrote: > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -60,6 +60,14 @@ static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused; > > static DEFINE_SPINLOCK(swapper_pgdir_lock); > > +/* > + * This represents if vmalloc and vmemmap address range overlap with > + * each other on an intermediate level kernel page table entry which > + * in turn helps in deciding whether empty kernel page table pages > + * if any can be freed during memory hotplug operation. > + */ > +static bool vmalloc_vmemmap_overlap; I'd say just move the static find_vmalloc_vmemmap_overlap() function here, the compiler should be sufficiently smart enough to figure out that it's just a build-time constant. > @@ -770,6 +1022,28 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, > void vmemmap_free(unsigned long start, unsigned long end, > struct vmem_altmap *altmap) > { > +#ifdef CONFIG_MEMORY_HOTPLUG > + /* > + * FIXME: We should have called remove_pagetable(start, end, true). > + * vmemmap and vmalloc virtual range might share intermediate kernel > + * page table entries. Removing vmemmap range page table pages here > + * can potentially conflict with a concurrent vmalloc() allocation. > + * > + * This is primarily because vmalloc() does not take init_mm ptl for > + * the entire page table walk and it's modification. Instead it just > + * takes the lock while allocating and installing page table pages > + * via [p4d|pud|pmd|pte]_alloc(). A concurrently vanishing page table > + * entry via memory hot remove can cause vmalloc() kernel page table > + * walk pointers to be invalid on the fly which can cause corruption > + * or worst, a crash. > + * > + * So free_empty_tables() gets called where vmalloc and vmemmap range > + * do not overlap at any intermediate level kernel page table entry. > + */ > + unmap_hotplug_range(start, end, true); > + if (!vmalloc_vmemmap_overlap) > + free_empty_tables(start, end); > +#endif > } So, I see the risk with overlapping and I guess for some kernel configurations (PAGE_SIZE == 64K) we may not be able to avoid it. If we can, that's great, otherwise could we rewrite the above functions to handle floor and ceiling similar to free_pgd_range()? (I wonder how this function works if you called it on init_mm and kernel address range). By having the vmemmap start/end information it avoids freeing partially filled page table pages. Another question: could we do the page table and the actual vmemmap pages freeing in a single pass (sorry if this has been discussed before)? > @@ -1048,10 +1322,18 @@ int p4d_free_pud_page(p4d_t *p4d, unsigned long addr) > } > > #ifdef CONFIG_MEMORY_HOTPLUG > +static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size) > +{ > + unsigned long end = start + size; > + > + WARN_ON(pgdir != init_mm.pgd); > + remove_pagetable(start, end, false); > +} I think the point I've made previously still stands: you only call remove_pagetable() with sparse_vmap == false in this patch. Just remove the extra argument and call unmap_hotplug_range() with sparse_vmap == false directly in remove_pagetable(). -- Catalin