On 09/11/2022 02:42, Muchun Song wrote: >> On Nov 8, 2022, at 18:38, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: >> On 08/11/2022 09:13, Muchun Song wrote: >>>> On Nov 7, 2022, at 23:39, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: >>>> + int nid = page_to_nid((struct page *)start); >>>> + struct page *page = NULL; >>>> + >>>> + /* >>>> + * Allocate a new head vmemmap page to avoid breaking a contiguous >>>> + * block of struct page memory when freeing it back to page allocator >>>> + * in free_vmemmap_page_list(). This will allow the likely contiguous >>>> + * struct page backing memory to be kept contiguous and allowing for >>>> + * more allocations of hugepages. Fallback to the currently >>>> + * mapped head page in case should it fail to allocate. >>>> + */ >>>> + if (IS_ALIGNED((unsigned long)start, PAGE_SIZE)) >>> >>> I'm curious why we need this check. IIUC, this is unnecessary. >>> >> >> So if the start of the vmemmap range (the head page) we will remap isn't the >> first struct page, then we would corrupt the other struct pages in >> that vmemmap page unrelated to hugetlb? That was my thinking > > Actually, @start address should be always aligned with PAGE_SIZE. If not, > vmemmap_remap_range() will complain. So the check can be removed. > True, but it will be a VM_BUG_ON(). I can remove this check here, but I would suggest that rather than crashing that we detect the error in the caller (vmemmap_remap_free()) prior to proceeding. >> >>>> + page = alloc_pages_node(nid, gfp_mask, 0); >>>> + walk.head_page = page; >>>> >>>> /* >>>> * In order to make remapping routine most efficient for the huge pages, >>>> -- >>>> 2.17.2 >>>> >>> >>> I have implemented a version based on yours, which does not introduce >>> ->head_page field (Not test if it works). Seems to be simple. >>> >> >> Let me try out with the adjustment below >> >>> Thanks. >>> >>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >>> index c98805d5b815..8ee94f6a6697 100644 >>> --- a/mm/hugetlb_vmemmap.c >>> +++ b/mm/hugetlb_vmemmap.c >>> @@ -202,12 +202,7 @@ static int vmemmap_remap_range(unsigned long start, unsigned long end, >>> return ret; >>> } while (pgd++, addr = next, addr != end); >>> >>> - /* >>> - * We only change the mapping of the vmemmap virtual address range >>> - * [@start + PAGE_SIZE, end), so we only need to flush the TLB which >>> - * belongs to the range. >>> - */ >>> - flush_tlb_kernel_range(start + PAGE_SIZE, end); >>> + flush_tlb_kernel_range(start, end); >>> >>> return 0; >>> } >>> @@ -246,6 +241,12 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr, >>> pte_t entry = mk_pte(walk->reuse_page, pgprot); >>> struct page *page = pte_page(*pte); >>> >>> + /* The case of remapping the head vmemmap page. */ I changed this comment to: /* Remapping the head page requires r/w */ >>> + if (unlikely(addr == walk->reuse_addr)) { >> >> You replace the head_page with checking the reuse_addr , but that is >> set on the tail page. So if we want to rely on reuse_addr perhaps >> best if do: >> >> if (unlikely(addr == (walk->reuse_addr - PAGE_SIZE))) { >> ... >> } > > I don't think so. The @addr here should be equal to @walk->reuse_addr > when vmemmap_remap_pte() is fist called since @addr does not be updated > from vmemmap_pte_range(). Right? > Ah, yes -- I misread it. I'm confusing with the 2 vmemmap pages reuse rather the 1 (which is the latest, yes). And thus the reuse_addr is pointed at the head page. Ok, this should be much cleaner with these adjustments you proposed, and works just as good as far as I tested I'll respin v3.