On Tue, 21 May 2013 16:27:29 -0700 Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> wrote: > Will something like the following work if we get rid of the percpu > counter changes and use __percpu_counter_add(..., batch)? In > benchmark with a lot of memory changes via brk, this makes quite > a difference when we go to a bigger batch size. That looks pretty close. > Tim > > Change batch size for memory accounting to be proportional to memory available. > > Currently the per cpu counter's batch size for memory accounting is > configured as twice the number of cpus in the system. However, > for system with very large memory, it is more appropriate to make it > proportional to the memory size per cpu in the system. > > For example, for a x86_64 system with 64 cpus and 128 GB of memory, > the batch size is only 2*64 pages (0.5 MB). So any memory accounting > changes of more than 0.5MB will overflow the per cpu counter into > the global counter. Instead, for the new scheme, the batch size > is configured to be 0.4% of the memory/cpu = 8MB (128 GB/64 /256), > which is more inline with the memory size. > > Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> > --- > include/linux/mman.h | 5 +++++ > mm/mmap.c | 14 ++++++++++++++ > mm/nommu.c | 14 ++++++++++++++ > 3 files changed, 33 insertions(+) > > diff --git a/include/linux/mman.h b/include/linux/mman.h > index 9aa863d..11d5ce9 100644 > --- a/include/linux/mman.h > +++ b/include/linux/mman.h > @@ -10,12 +10,17 @@ > extern int sysctl_overcommit_memory; > extern int sysctl_overcommit_ratio; > extern struct percpu_counter vm_committed_as; > +extern int vm_committed_as_batch; > > unsigned long vm_memory_committed(void); > > static inline void vm_acct_memory(long pages) > { > +#ifdef CONFIG_SMP > + __percpu_counter_add(&vm_committed_as, pages, vm_committed_as_batch); > +#else > percpu_counter_add(&vm_committed_as, pages); > +#endif > } I think we could use __percpu_counter_add() unconditionally here and just do #ifdef CONFIG_SMP #define vm_committed_as_batch 0 #else int vm_committed_as_batch; #endif The EXPORT_SYMBOL(vm_committed_as_batch) is unneeded. > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -3145,11 +3145,25 @@ void mm_drop_all_locks(struct mm_struct *mm) > /* > * initialise the VMA slab > */ > + > +int vm_committed_as_batch; > +EXPORT_SYMBOL(vm_committed_as_batch); > + > +static int mm_compute_batch(void) > +{ > + int nr = num_present_cpus(); > + int batch = max(32, nr*2); > + > + /* batch size set to 0.4% of (total memory/#cpus) */ > + return max((int) (totalram_pages/nr) / 256, batch); > +} Change this to do the assignment to vm_committed_as_batch then put this code inside #ifdef CONFIG_SMP and do #else /* CONFIG_SMP */ static inline void mm_compute_batch(void) { } #endif > void __init mmap_init(void) > { > int ret; > > ret = percpu_counter_init(&vm_committed_as, 0); > + vm_committed_as_batch = mm_compute_batch(); This becomes just mm_compute_batch(); > VM_BUG_ON(ret); > } > > diff --git a/mm/nommu.c b/mm/nommu.c > index 298884d..1b7008a 100644 > --- a/mm/nommu.c > +++ b/mm/nommu.c > @@ -527,11 +527,25 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) > /* > * initialise the VMA and region record slabs > */ > + > +int vm_committed_as_batch; > +EXPORT_SYMBOL(vm_committed_as_batch); > + > +static int mm_compute_batch(void) > +{ > + int nr = num_present_cpus(); > + int batch = max(32, nr*2); > + > + /* batch size set to 0.4% of (total memory/#cpus) */ > + return max((int) (totalram_pages/nr) / 256, batch); > +} > + > void __init mmap_init(void) > { > int ret; > > ret = percpu_counter_init(&vm_committed_as, 0); > + vm_committed_as_batch = mm_compute_batch(); > VM_BUG_ON(ret); > vm_region_jar = KMEM_CACHE(vm_region, SLAB_PANIC); I'm not sure that CONFIG_MMU=n && CONFIG_SMP=y even exists. Perhaps it does. But there's no point in ruling out that option here. The nommu code becomes identical to the mmu code so we should put it in a shared file. I suppose mmap.c would be as good a place as any. We could make mm_compute_batch() __init and call it from mm_init(). But really it should be __meminit and there should be a memory-hotplug notifier handler which adjusts vm_committed_as_batch's value. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>