From: Zhigang Lu <tonnylu@xxxxxxxxxxx> When mmapping an existing hugetlbfs file with MAP_POPULATE, we find it is very time consuming. For example, mmapping a 128GB file takes about 50 milliseconds. Sampling with perfevent shows it spends 99% time in the same_page loop in follow_hugetlb_page(). samples: 205 of event 'cycles', Event count (approx.): 136686374 - 99.04% test_mmap_huget [kernel.kallsyms] [k] follow_hugetlb_page follow_hugetlb_page __get_user_pages __mlock_vma_pages_range __mm_populate vm_mmap_pgoff sys_mmap_pgoff sys_mmap system_call_fastpath __mmap64 follow_hugetlb_page() is called with pages=NULL and vmas=NULL, so for each hugepage, we run into the same_page loop for pages_per_huge_page() times, but doing nothing. With this change, it takes less then 1 millisecond to mmap a 128GB file in hugetlbfs. Signed-off-by: Zhigang Lu <tonnylu@xxxxxxxxxxx> Reviewed-by: Haozhong Zhang <hzhongzhang@xxxxxxxxxxx> Reviewed-by: Zongming Zhang <knightzhang@xxxxxxxxxxx> Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> Acked-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> --- mm/hugetlb.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6d7296d..a096fb3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4391,6 +4391,21 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, break; } } + + /* + * If subpage information not requested, update counters + * and skip the same_page loop below. + */ + if (!pages && !vmas && !pfn_offset && + (vaddr + huge_page_size(h) < vma->vm_end) && + (remainder >= pages_per_huge_page(h))) { + vaddr += huge_page_size(h); + remainder -= pages_per_huge_page(h); + i += pages_per_huge_page(h); + spin_unlock(ptl); + continue; + } + same_page: if (pages) { pages[i] = mem_map_offset(page, pfn_offset); -- 1.8.3.1