On 10/15/2011 04:38 AM, Suleiman Souhlal wrote:
Signed-off-by: Suleiman Souhlal<suleiman@xxxxxxxxxx>
---
Documentation/cgroups/memory.txt | 33 ++++++++++++++++++++++++++++++++-
1 files changed, 32 insertions(+), 1 deletions(-)
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 06eb6d9..277cf25 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -220,7 +220,37 @@ caches are dropped. But as mentioned above, global LRU can do swapout memory
from it for sanity of the system's memory management state. You can't forbid
it by cgroup.
-2.5 Reclaim
+2.5 Kernel Memory
+
+A cgroup's kernel memory is accounted into its memory.usage_in_bytes and
+is also shown in memory.stat as kernel_memory. Kernel memory does not get
+counted towards the root cgroup's memory.usage_in_bytes, but still
+appears in its kernel_memory.
+
+Upon cgroup deletion, all the remaining kernel memory gets moved to the
+root cgroup.
+
+An accounted kernel memory allocation may trigger reclaim in that cgroup,
+and may also OOM.
+
+Currently only slab memory allocated without __GFP_NOACCOUNT and
+__GFP_NOFAIL gets accounted to the current process' cgroup.
+
+2.5.1 Slab
+
+Slab gets accounted on a per-page basis, which is done by using per-cgroup
+kmem_caches. These per-cgroup kmem_caches get created on-demand, the first
+time a specific kmem_cache gets used by a cgroup.
Well, let me first start with some general comments:
I think the approach I've taken, which is, allowing the cache creators
to register themselves for cgroup usage, is better than scanning the
list of existing caches. Couple of key reasons:
1) We then don't need another flag. _GFP_NOACCOUNT => doing nothing.
2) Less polution in the slab structure itself, which makes it have
higher chances of inclusion, and less duplicate work in the slub.
3) Easier to do per-cache tuning if we ever want to.
About, on-demand creation, I think it is a nice idea. But it may impact
allocation latency on caches that we are sure to be used, like the
dentry cache. So that gives us:
4) If the cache creator is registering itself, it can specify which
behavior it wants. On-Demand creation vs Straight creation.
+Slab memory that cannot be attributed to a cgroup gets charged to the root
+cgroup.
+
+A per-cgroup kmem_cache is named like the original, with the cgroup's name
+in parethesis.
I used the address for simplicity, but I like names better. Agree here.
Extending it: If a task resides in the cgroup itself, I think it should
see its cache only, in /proc/slabinfo (selectable, take a look at
https://lkml.org/lkml/2011/10/6/132 for more details)
+When a kmem_cache gets migrated to the root cgroup, "dead" is appended to
+its name, to indicated that it is not going to be used for new allocations.
Why not just remove it?
+2.6 Reclaim
Each cgroup maintains a per cgroup LRU which has the same structure as
global VM. When a cgroup goes over its limit, we first try
@@ -396,6 +426,7 @@ active_anon - # of bytes of anonymous and swap cache memory on active
inactive_file - # of bytes of file-backed memory on inactive LRU list.
active_file - # of bytes of file-backed memory on active LRU list.
unevictable - # of bytes of memory that cannot be reclaimed (mlocked etc).
+kernel_memory - # of bytes of kernel memory.
# status considering hierarchy (see memory.use_hierarchy settings)
Another
* I think usage of res_counters is better than relying on slab fields to
impose limits,
* We still need the ability to restrict kernel memory usage separately
from user memory, dependent on a selectable, as we already discussed here.
* I think we should do everything in our power to reduce overhead for
the special case in which only the root cgroup exist . Take a look at
what happened with the following thread:
https://lkml.org/lkml/2011/10/13/201. To be honest, I think it is an
idea we should least consider: not to account *anything* to the root
cgroup (make a selectable if we want to conserve behaviour), user
memory, kernel memory. Then we can keep native performance for
non-cgroup users. (But that's another discussion anyway)
All in all, this is a good start. Both our approaches have a lot in
common (well, which is not strange, given that we discussed them a lot
on the past month =p, and I did like some concepts)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>