On Thu, 17 Oct 2024 14:20:12 +0100, willy@xxxxxxxxxxxxx wrote: > On Thu, Oct 17, 2024 at 02:18:41PM +0800, lizhe.67@xxxxxxxxxxxxx wrote: > > On Wed, 16 Oct 2024 12:53:15 +0100, willy@xxxxxxxxxxxxx wrote: > > > > >On Wed, Oct 16, 2024 at 12:36:00PM +0800, lizhe.67@xxxxxxxxxxxxx wrote: > > >> From: Li Zhe <lizhe.67@xxxxxxxxxxxxx> > > >> > > >> In function collapse_huge_page(), we drop mmap read lock and get > > >> mmap write lock to prevent most accesses to pagetables. There is > > >> a small time window to allow other tasks to acquire the mmap lock. > > >> With the use of upgrade_read(), we don't need to check vma and pmd > > >> again in most cases. > > > > > >This is clearly a performance optimisation. So you must have some > > >numebrs that justify this, please include them. > > > > Yes, I will add the relevant data to v2 patch. > > How about telling us all now so we know whether to continue discussing > this? In my test environment, function collapse_huge_page() only achieved a 0.25% performance improvement. I use ftrace to get the execution time of collapse_huge_page(). The test code and test command are as follows. (1) Test result: average execution time of collapse_huge_page() before this patch: 1611.06283 us after this patch: 1597.01474 us (2) Test code: #define MMAP_SIZE (2ul*1024*1024) #define ALIGN(x, mask) (((x) + ((mask)-1)) & ~((mask)-1)) int main(void) { int num = 100; size_t page_sz = getpagesize(); while (num--) { size_t index; unsigned char *p_map; unsigned char *p_map_real; p_map = (unsigned char *)mmap(0, 2 * MMAP_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0); if (p_map == MAP_FAILED) { printf("mmap fail\n"); return -1; } else { p_map_real = (char *)ALIGN((unsigned long)p_map, MMAP_SIZE); printf("mmap get %p, align to %p\n", p_map, p_map_real); } for(index = 0; index < MMAP_SIZE; index += page_sz) p_map_real[index] = 6; int ret = madvise(p_map_real, MMAP_SIZE, 25); printf("ret is %d\n", ret); munmap(p_map, 2 * MMAP_SIZE); } return 0; } (3) Test command: echo never > /sys/kernel/mm/transparent_hugepage/enabled gcc test.c -o test trace-cmd record -p function_graph -g collapse_huge_page --max-graph-depth 1 ./test The optimization of the function collapse_huge_page() seems insignificant. I am not sure whether it will have a more obvious optimization effect in other scenarios.