On Thu, Jul 19, 2012 at 04:34:26PM -0700, Tim Chen wrote: > Hi, > > I noticed in a multi-process parallel files reading benchmark I ran on a > 8 socket machine, throughput slowed down by a factor of 8 when I ran > the benchmark within a cgroup container. I traced the problem to the > following code path (see below) when we are trying to reclaim memory > from file cache. The res_counter_uncharge function is called on every > page that's reclaimed and created heavy lock contention. The patch > below allows the reclaimed pages to be uncharged from the resource > counter in batch and recovered the regression. > > Tim > > 40.67% usemem [kernel.kallsyms] [k] _raw_spin_lock > | > --- _raw_spin_lock > | > |--92.61%-- res_counter_uncharge > | | > | |--100.00%-- __mem_cgroup_uncharge_common > | | | > | | |--100.00%-- mem_cgroup_uncharge_cache_page > | | | __remove_mapping > | | | shrink_page_list > | | | shrink_inactive_list > | | | shrink_mem_cgroup_zone > | | | shrink_zone > | | | do_try_to_free_pages > | | | try_to_free_pages > | | | __alloc_pages_nodemask > | | | alloc_pages_current > > > --- > Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> -- Kirill A. Shutemov
Attachment:
signature.asc
Description: Digital signature