Re: regression caused by cgroups optimization in 3.17-rc2

Dave Hansen <dave@xxxxxxxx> · Tue, 02 Sep 2014 13:57:22 -0700

I, of course, forgot to include the most important detail.  This appears
to be pretty run-of-the-mill spinlock contention in the resource counter
code.  Nearly 80% of the CPU is spent spinning in the charge or uncharge
paths in the kernel.  It is apparently spinning on res_counter->lock in
both the charge and uncharge paths.

It already does _some_ batching here on the free side, but that
apparently breaks down after ~40 threads.

It's a no-brainer since the patch in question removed an optimization
skipping the charging, and now we're seeing overhead from the charging.

Here's the first entry from perf top:

    80.18%    80.18%  [kernel]               [k] _raw_spin_lock
                  |
                  --- _raw_spin_lock
                     |
                     |--66.59%-- res_counter_uncharge_until
                     |          res_counter_uncharge
                     |          uncharge_batch
                     |          uncharge_list
                     |          mem_cgroup_uncharge_list
                     |          release_pages
                     |          free_pages_and_swap_cache
                     |          tlb_flush_mmu_free
                     |          |
                     |          |--90.12%-- unmap_single_vma
                     |          |          unmap_vmas
                     |          |          unmap_region
                     |          |          do_munmap
                     |          |          vm_munmap
                     |          |          sys_munmap
                     |          |          system_call_fastpath
                     |          |          __GI___munmap
                     |          |
                     |           --9.88%-- tlb_flush_mmu
                     |                     tlb_finish_mmu
                     |                     unmap_region
                     |                     do_munmap
                     |                     vm_munmap
                     |                     sys_munmap
                     |                     system_call_fastpath
                     |                     __GI___munmap
                     |
                     |--46.13%-- __res_counter_charge
                     |          res_counter_charge
                     |          try_charge
                     |          mem_cgroup_try_charge
                     |          |
                     |          |--99.89%-- do_cow_fault
                     |          |          handle_mm_fault
                     |          |          __do_page_fault
                     |          |          do_page_fault
                     |          |          page_fault
                     |          |          testcase
                     |           --0.11%-- [...]
                     |
                     |--1.14%-- do_cow_fault
                     |          handle_mm_fault
                     |          __do_page_fault
                     |          do_page_fault
                     |          page_fault
                     |          testcase
                      --8217937613.29%-- [...]

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>