[RFC PATCH 0/6] memcg: vfs isolation in memory cgroup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patchset adds the functionality of isolating the vfs slab objects per-memcg
under reclaim. This feature is a *must-have* after the kernel slab memory
accounting which starts charging the slab objects into individual memcgs. The
existing per-superblock shrinker doesn't work since it will end up reclaiming
slabs being charged to other memcgs.

The last rebase of the patch is v3.3 and now the kernel is up and running on
our enviroment. I rebased it on top of v3.5 w/ little conflicts for posting
here, and this post is mainly a RFC for the design.

There is a functional dependency of this patchset on slab accounting, where it
queries the owner of the slab object. I left that commented out in order to
get the kernel at least compile for now. Regarding the two implementations of
the kernel slab accounting in google vs upstream, they shares lots of
similarities and the main difference is how reparenting works under
mem_cgroup_destroy(). In google, we have the kmem_cache reparented to root as
well as the dentry objects. So further pressure applies under root will end up
reclaiming the objects as well. By given the kernel slab accounting feature is
still under discussion now, I will leave that on the side for this RFC and
assume the reparenting to root still hold.

The patch now is only handling dentry cache by given the nature dentry pinned
inode. Based on the data we've collected, that contributes the main factor of
the reclaimable slab objects. We also could make a generic infrastructure for
all the shrinkers (if needed). But as we discussed during last KS, making dentry
works would be a good start. Eventually, that might be the only thing we cares
about.

Before getting into the implementation, we did consider other options:
1. keep the global list but does the filtering when scan. The performance is
really bad under our tests.
2. make per-superblock per-memcg lru list. The implementation would be very
complicated considering all the race conditions.

The work was started by Andrew Bresticker (a former intern) and also greatly
inspired by Nikhil Rao<ncrao@xxxxxxxxxx>, Greg Thelen(gthelen@xxxxxxxxxx>
and Suleiman Souhlal<suleiman@xxxxxxxxxx> for the slab accounting.

Ying Han (6):
  mm: pass priority to prune_icache_sb()
  mm: memcg add target_mem_cgroup, mem_cgroup fields to shrink_control
  mm: memcg restructure shrink_slab to walk memory cgroup hierarchy
  mm: shrink slab with memcg context
  mm: move dcache slabs to root lru when memcg exits
  mm: shrink slab during memcg reclaim

 fs/dcache.c                |  214 ++++++++++++++++++++++++++++++++++++++++---
 fs/inode.c                 |   40 ++++++++-
 fs/super.c                 |   30 +++++--
 include/linux/dcache.h     |    8 ++
 include/linux/fs.h         |   34 ++++++-
 include/linux/memcontrol.h |    8 ++
 include/linux/shrinker.h   |   12 +++
 include/linux/slab_def.h   |    5 +
 mm/memcontrol.c            |   49 ++++++++++
 mm/slab.c                  |    8 ++
 mm/vmscan.c                |   70 +++++++++++----
 11 files changed, 432 insertions(+), 46 deletions(-)

-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]