On Fri, May 15, 2020 at 05:08:21PM +0800, Michal Hocko wrote: > > > > +void mm_compute_batch(void) > > > > { > > > > u64 memsized_batch; > > > > s32 nr = num_present_cpus(); > > > > s32 batch = max_t(s32, nr*2, 32); > > > > - > > > > - /* batch size set to 0.4% of (total memory/#cpus), or max int32 */ > > > > - memsized_batch = min_t(u64, (totalram_pages()/nr)/256, 0x7fffffff); > > > > + unsigned long ram_pages = totalram_pages(); > > > > + > > > > + /* > > > > + * For policy of OVERCOMMIT_NEVER, set batch size to 0.4% > > > > + * of (total memory/#cpus), and lift it to 6.25% for other > > > > + * policies to easy the possible lock contention for percpu_counter > > > > + * vm_committed_as, while the max limit is INT_MAX > > > > + */ > > > > + if (sysctl_overcommit_memory == OVERCOMMIT_NEVER) > > > > + memsized_batch = min_t(u64, ram_pages/nr/256, INT_MAX); > > > > + else > > > > + memsized_batch = min_t(u64, ram_pages/nr/16, INT_MAX); > > > > Also as you mentioned there are real-world work loads with big mmap > > size and multi-threading, can we lift it even further ? > > memsized_batch = min_t(u64, ram_pages/nr/4, INT_MAX) > > Try to measure those and see what numbers look like. With the same benchmark, for the 16X lifting in this patch, about 1/3 of the test platforms (servers, desktops, laptops) will show improvements (up to 20X for servers, much less on platform with fewer CPUs). If we further lift it to 64X, most of the test platforms will show improvements. Thanks, Feng