Re: regression caused by cgroups optimization in 3.17-rc2

Michal Hocko <mhocko@xxxxxxx> · Wed, 10 Sep 2014 18:29:36 +0200

On Fri 05-09-14 11:25:37, Michal Hocko wrote:
> On Thu 04-09-14 13:27:26, Dave Hansen wrote:
> > On 09/04/2014 07:27 AM, Michal Hocko wrote:
> > > Ouch. free_pages_and_swap_cache completely kills the uncharge batching
> > > because it reduces it to PAGEVEC_SIZE batches.
> > > 
> > > I think we really do not need PAGEVEC_SIZE batching anymore. We are
> > > already batching on tlb_gather layer. That one is limited so I think
> > > the below should be safe but I have to think about this some more. There
> > > is a risk of prolonged lru_lock wait times but the number of pages is
> > > limited to 10k and the heavy work is done outside of the lock. If this
> > > is really a problem then we can tear LRU part and the actual
> > > freeing/uncharging into a separate functions in this path.
> > > 
> > > Could you test with this half baked patch, please? I didn't get to test
> > > it myself unfortunately.
> > 
> > 3.16 settled out at about 11.5M faults/sec before the regression.  This
> > patch gets it back up to about 10.5M, which is good.
> 
> Dave, would you be willing to test the following patch as well? I do not
> have a huge machine at hand right now. It would be great if you could

I was playing with 48CPU with 32G of RAM machine but the res_counter
lock didn't show up in the traces much (this was with 96 processes doing
mmap (256M private file, faul, unmap in parallel):
                          |--0.75%-- __res_counter_charge
                          |          res_counter_charge
                          |          try_charge
                          |          mem_cgroup_try_charge
                          |          |          
                          |          |--81.56%-- do_cow_fault
                          |          |          handle_mm_fault
                          |          |          __do_page_fault
                          |          |          do_page_fault
                          |          |          page_fault
[...]
                          |          |          
                          |           --18.44%-- __add_to_page_cache_locked
                          |                     add_to_page_cache_lru
                          |                     mpage_readpages
                          |                     ext4_readpages
                          |                     __do_page_cache_readahead
                          |                     ondemand_readahead
                          |                     page_cache_async_readahead
                          |                     filemap_fault
                          |                     __do_fault
                          |                     do_cow_fault
                          |                     handle_mm_fault
                          |                     __do_page_fault
                          |                     do_page_fault
                          |                     page_fault

Nothing really changed in that regards when I reduced mmap size to 128M
and run with 4*CPUs.

I do not have a bigger machine to play with unfortunately. I think the
patch makes sense on its own. I would really appreciate if you could
give it a try on your machine with !root memcg case to see how much it
helped. I would expect similar results to your previous testing without
the revert and Johannes' patch.

[...]
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>