On Tue, May 26, 2020 at 06:14:13PM -0700, Andi Kleen wrote: > On Tue, May 26, 2020 at 02:14:59PM -0400, Qian Cai wrote: > > On Thu, May 21, 2020 at 05:27:26PM -0400, Qian Cai wrote: > > > On Fri, May 08, 2020 at 03:25:14PM +0800, Feng Tang wrote: > > > > When checking a performance change for will-it-scale scalability > > > > mmap test [1], we found very high lock contention for spinlock of > > > > percpu counter 'vm_committed_as': > > > > > > > > 94.14% 0.35% [kernel.kallsyms] [k] _raw_spin_lock_irqsave > > > > 48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap; > > > > 45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap; > > > > > > > > Actually this heavy lock contention is not always necessary. The > > > > 'vm_committed_as' needs to be very precise when the strict > > > > OVERCOMMIT_NEVER policy is set, which requires a rather small batch > > > > number for the percpu counter. > > > > > > > > So lift the batch number to 16X for OVERCOMMIT_ALWAYS and > > > > OVERCOMMIT_GUESS policies, and add a sysctl handler to adjust it > > > > when the policy is reconfigured. > > > > > > > > Benchmark with the same testcase in [1] shows 53% improvement on a > > > > 8C/16T desktop, and 2097%(20X) on a 4S/72C/144T server. And no change > > > > for some platforms, due to the test mmap size of the case is bigger > > > > than the batch number computed, though the patch will help mmap/munmap > > > > generally. > > > > > > > > [1] https://lkml.org/lkml/2020/3/5/57 > > > > > > > > There are some style complain from checkpatch for patch 3, as it > > > > follows the similar format of sibling functions > > > > > > > > patch1: a cleanup for /proc/meminfo > > > > patch2: a preparation patch which also improve the accuracy of > > > > vm_memory_committed > > > > patch3: the main change > > > > > > > > Please help to review, thanks! > > > > > > Reverted this series fixed a warning under memory pressue. > > > > Andrew, Stephen, can you drop this series? > > Hope you're happy now. > > The warning is just not needed for the always case. The whole point > of the patch was to maintain these counters only approximately. > We could of course use _sum or _sum_positive in the warning, but that would > just be unnecessary overhead in this moderately hot path. > > So this patch should fix it: (untested) > > Possibly might need to move it even further down because the patch > also affected the _GUESS case. Thanks Andi for the patch. And I agree that it could be after _GUESS case, as existig check for _GUESS is loose: if (pages > totalram_pages() + total_swap_pages) goto error; Thanks, Feng > > > diff --git a/mm/util.c b/mm/util.c > index 988d11e6c17c..fa78f90e29a1 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -798,10 +798,6 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin) > { > long allowed; > > - VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) < > - -(s64)vm_committed_as_batch * num_online_cpus(), > - "memory commitment underflow"); > - > vm_acct_memory(pages); > > /* > @@ -810,6 +806,10 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin) > if (sysctl_overcommit_memory == OVERCOMMIT_ALWAYS) > return 0; > > + VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) < > + -(s64)vm_committed_as_batch * num_online_cpus(), > + "memory commitment underflow"); > + > if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) { > if (pages > totalram_pages() + total_swap_pages) > goto error;