On Sun, Nov 26, 2023 at 03:18:54PM +0800, David Wang wrote: > I add memory access between mmap and munmap to the simple stress, and timeit. It's still not a very good benchmark ... > My test code now is: > > #define MAXN 1024 > struct { void* addr; size_t n; } maps[MAXN]; > void accessit(char *addr, size_t n) { > for (int i=0; i<n; i+=128) addr[i]=i; > } > int main() { > int i, n, k, r; > void *p; > for (i=0; i<MAXN; i++) { > n = 1024*((rand()%32)+1); > p = mmap(NULL, n, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); So 'n' is now a number between 1kB and 32kB. That's not terribly realistic; I'd say you want to be more like n = 4096 * ((rand() % 512) + 1)); > for (i=0; i<10000000; i++) { > k = rand()%MAXN; > #ifdef PAGE_FAULT > accessit((char*)maps[k].addr, maps[k].n); > #endif > r = munmap(maps[k].addr, maps[k].n); > if (r) { > perror("fail to munmap"); > return -1; > } > n = 1024*((rand()%32)+1); > p = mmap(NULL, n, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); Are you simulating something a real application actually does? Because this all seems very weird and micro-benchmark to me. The real applications we've benchmarked see a speedup so I'm not thrilled about chasing down something that no real application does. In terms of what's going on in the kernel, for each loop, you're calling munmap(), taking between 1 and 8 page faults, then calling mmap(). That may just be too few page faults to see the benefit of the maple tree.