On 03/11/2022 05:20 PM, David Hildenbrand wrote: > On 11.03.22 10:01, Bibo Mao wrote: >> collapse huge page is slow, specially when khugepaged daemon runs >> on different numa node with that of huge page. It suffers from >> huge page copying across nodes, also cache is not used for target >> node. With this patch, khugepaged daemon switches to the same numa >> node with huge page. It saves copying time and makes use of local >> cache better. > > Hi, > > just the usual question, do you have any performance numbers to back > your claims (e.g., "is slow, specially when") and proof that this patch > does the trick? With specint 2006 on loongarch 3C5000L 32core numa system, it improves about 6%. The page size is 16K and pmd page size is 32M, memory performance across numa node is obvious different. However I do not test it on x86 box. > > >> >> Signed-off-by: Bibo Mao <maobibo@xxxxxxxxxxx> >> --- >> mm/khugepaged.c | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >> index 131492fd1148..460c285dc974 100644 >> --- a/mm/khugepaged.c >> +++ b/mm/khugepaged.c >> @@ -116,6 +116,7 @@ struct khugepaged_scan { >> struct list_head mm_head; >> struct mm_slot *mm_slot; >> unsigned long address; >> + int node; >> }; >> >> static struct khugepaged_scan khugepaged_scan = { >> @@ -1066,6 +1067,7 @@ static void collapse_huge_page(struct mm_struct *mm, >> struct vm_area_struct *vma; >> struct mmu_notifier_range range; >> gfp_t gfp; >> + const struct cpumask *cpumask; > > We tend to stick to reverse Christmas tree format as good as possible. > >> >> VM_BUG_ON(address & ~HPAGE_PMD_MASK); >> >> @@ -1079,6 +1081,13 @@ static void collapse_huge_page(struct mm_struct *mm, >> * that. We will recheck the vma after taking it again in write mode. >> */ >> mmap_read_unlock(mm); >> + >> + /* sched to specified node before huage page memory copy */ > > s/huage/huge/ > >> + cpumask = cpumask_of_node(node); >> + if ((khugepaged_scan.node != node) && !cpumask_empty(cpumask)) { >> + set_cpus_allowed_ptr(current, cpumask); >> + khugepaged_scan.node = node; >> + } >> new_page = khugepaged_alloc_page(hpage, gfp, node); >> if (!new_page) { >> result = SCAN_ALLOC_HUGE_PAGE_FAIL; >> @@ -2380,6 +2389,7 @@ int start_stop_khugepaged(void) >> kthread_stop(khugepaged_thread); >> khugepaged_thread = NULL; >> } >> + khugepaged_scan.node = NUMA_NO_NODE; >> set_recommended_min_free_kbytes(); >> fail: >> mutex_unlock(&khugepaged_mutex); > >