On Thu, Jun 20, 2019 at 7:24 AM Waiman Long <longman@xxxxxxxxxx> wrote: > > On 6/19/19 7:48 PM, Shakeel Butt wrote: > > Hi Waiman, > > > > On Wed, Jun 19, 2019 at 10:16 AM Waiman Long <longman@xxxxxxxxxx> wrote: > >> There are concerns about memory leaks from extensive use of memory > >> cgroups as each memory cgroup creates its own set of kmem caches. There > >> is a possiblity that the memcg kmem caches may remain even after the > >> memory cgroups have been offlined. Therefore, it will be useful to show > >> the status of each of memcg kmem caches. > >> > >> This patch introduces a new <debugfs>/memcg_slabinfo file which is > >> somewhat similar to /proc/slabinfo in format, but lists only information > >> about kmem caches that have child memcg kmem caches. Information > >> available in /proc/slabinfo are not repeated in memcg_slabinfo. > >> > >> A portion of a sample output of the file was: > >> > >> # <name> <css_id[:dead]> <active_objs> <num_objs> <active_slabs> <num_slabs> > >> rpc_inode_cache root 13 51 1 1 > >> rpc_inode_cache 48 0 0 0 0 > >> fat_inode_cache root 1 45 1 1 > >> fat_inode_cache 41 2 45 1 1 > >> xfs_inode root 770 816 24 24 > >> xfs_inode 92 22 34 1 1 > >> xfs_inode 88:dead 1 34 1 1 > >> xfs_inode 89:dead 23 34 1 1 > >> xfs_inode 85 4 34 1 1 > >> xfs_inode 84 9 34 1 1 > >> > >> The css id of the memcg is also listed. If a memcg is not online, > >> the tag ":dead" will be attached as shown above. > >> > >> Suggested-by: Shakeel Butt <shakeelb@xxxxxxxxxx> > >> Signed-off-by: Waiman Long <longman@xxxxxxxxxx> > >> --- > >> mm/slab_common.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 57 insertions(+) > >> > >> diff --git a/mm/slab_common.c b/mm/slab_common.c > >> index 58251ba63e4a..2bca1558a722 100644 > >> --- a/mm/slab_common.c > >> +++ b/mm/slab_common.c > >> @@ -17,6 +17,7 @@ > >> #include <linux/uaccess.h> > >> #include <linux/seq_file.h> > >> #include <linux/proc_fs.h> > >> +#include <linux/debugfs.h> > >> #include <asm/cacheflush.h> > >> #include <asm/tlbflush.h> > >> #include <asm/page.h> > >> @@ -1498,6 +1499,62 @@ static int __init slab_proc_init(void) > >> return 0; > >> } > >> module_init(slab_proc_init); > >> + > >> +#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_MEMCG_KMEM) > >> +/* > >> + * Display information about kmem caches that have child memcg caches. > >> + */ > >> +static int memcg_slabinfo_show(struct seq_file *m, void *unused) > >> +{ > >> + struct kmem_cache *s, *c; > >> + struct slabinfo sinfo; > >> + > >> + mutex_lock(&slab_mutex); > > On large machines there can be thousands of memcgs and potentially > > each memcg can have hundreds of kmem caches. So, the slab_mutex can be > > held for a very long time. > > But that is also what /proc/slabinfo does by doing mutex_lock() at > slab_start() and mutex_unlock() at slab_stop(). So the same problem will > happen when /proc/slabinfo is being read. > > When you are in a situation that reading /proc/slabinfo take a long time > because of the large number of memcg's, the system is in some kind of > trouble anyway. I am saying that we should not improve the scalability > of this patch. It is just that some nasty race conditions may pop up if > we release the lock and re-acquire it latter. That will greatly > complicate the code to handle all those edge cases. > We have been using that interface and implementation for couple of years and have not seen any race condition. However I am fine with what you have here for now. We can always come back if we think we need to improve it. > > Our internal implementation traverses the memcg tree and then > > traverses 'memcg->kmem_caches' within the slab_mutex (and > > cond_resched() after unlock). > For cgroup v1, the setting of the CONFIG_SLUB_DEBUG option will allow > you to iterate and display slabinfo just for that particular memcg. I am > thinking of extending the debug controller to do similar thing for > cgroup v2. I was also planning to look into that and it seems like you are already on it. Do CC me the patches. > >> + seq_puts(m, "# <name> <css_id[:dead]> <active_objs> <num_objs>"); > >> + seq_puts(m, " <active_slabs> <num_slabs>\n"); > >> + list_for_each_entry(s, &slab_root_caches, root_caches_node) { > >> + /* > >> + * Skip kmem caches that don't have any memcg children. > >> + */ > >> + if (list_empty(&s->memcg_params.children)) > >> + continue; > >> + > >> + memset(&sinfo, 0, sizeof(sinfo)); > >> + get_slabinfo(s, &sinfo); > >> + seq_printf(m, "%-17s root %6lu %6lu %6lu %6lu\n", > >> + cache_name(s), sinfo.active_objs, sinfo.num_objs, > >> + sinfo.active_slabs, sinfo.num_slabs); > >> + > >> + for_each_memcg_cache(c, s) { > >> + struct cgroup_subsys_state *css; > >> + char *dead = ""; > >> + > >> + css = &c->memcg_params.memcg->css; > >> + if (!(css->flags & CSS_ONLINE)) > >> + dead = ":dead"; > > Please note that Roman's kmem cache reparenting patch series have made > > kmem caches of zombie memcgs a bit tricky. On memcg offlining the > > memcg kmem caches are reparented and the css->id can get recycled. So, > > we want to know that the a kmem cache is reparented and which memcg it > > belonged to initially. Determining if a kmem cache is reparented, we > > can store a flag on the kmem cache and for the previous memcg we can > > use fhandle. However to not make this more complicated, for now, we > > can just have the info that the kmem cache was reparented i.e. belongs > > to an offlined memcg. > > I need to play with Roman's kmem cache reparenting patch a bit more to > see how to properly recognize a reparent'ed kmem cache. What I have > noticed is that the dead kmem caches that I saw at boot up were gone > after applying his patch. So that is a good thing. > By gone, do you mean the kmem cache got freed or the kmem cache is not part of online parent memcg and thus no more dead kmem cache? > For now, I think the current patch is good enough for its purpose. I may > send follow-up if I see something that can be improved. > I would like to see the recognition of reparent'ed kmem cache in this patch. However if others are ok with the current status of the patch then I will not stand in the way. thanks, Shakeel