Re: [PATCH v2 4/5] mm: make memcg visible to lru walker isolation function

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 06, 2020 at 10:41:22PM +0800, Yafang Shao wrote:
> On Mon, Jan 6, 2020 at 8:17 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > We can clean up the code a lot by getting rid of the unnecessary
> > indenting by doing this:
> >
> >         /* iterate the global lru first */
> >         isolated = list_lru_walk_one(lru, nid, NULL, isolate, cb_arg,
> >                                         nr_to_walk);
> >         if (!list_lru_memcg_aware(lru))
> >                 return isolated;
> >
> >         nlru = &lru->node[nid];
> >         for_each_mem_cgroup(memcg) {
> >                 /* already scanned the root memcg above */
> >                 if (is_root_memcg(memcg))
> >                         continue;
> >                 if (*nr_to_walk <= 0)
> >                         break;
> >                 spin_lock(&nlru->lock);
> >                 isolated += __list_lru_walk_one(nlru, memcg,
> >                                                 isolate, cb_arg,
> >                                                 nr_to_walk);
> >                 spin_unlock(&nlru->lock);
> >         }
> >         return isolated;
> >
> 
> That's eaiser to understand.
> I wil change it like this in next version.

Thanks!

> >
> > And so to architecture... This all makes me wonder why we still
> > special case the memcg LRU lists here.
> 
> Can't agree more.
> The first time I read this code, I wondered why not assign an
> non-negtive number to kmemcg_id of the root_mem_cgroup and then use
> memcg_lrus as well.

Yeah, I've been wondering why the ID is -1 instead of 0 when we
have a global variable that stores the root memcg that we can
compare pointers directly against via is_root_memcg(). all it seems
to do is make things more complex by having to special case the root
memcg....

>From that perspective, I do like your change to use the memcg
iterator functions rather than a for loop over the range of indexes,
but I'd much prefer to see this method used consistently everywhere
rather than the way we've duplicated lots of code by tacking memcgs
onto the side of the non-memcg code paths.

> > Ever since we went to
> > per-node memcg lru lists (~2015, iirc), there's been this special
> > hidden hack for the root memcg to use the global list. and one that
> > I have to read lots of code to remind myself it exists every time I
> > have to did into this code again.
> >
> > I mean, if the list is not memcg aware, then it only needs a single
> > list per node - the root memcg list. That could be easily stored in
> > the memcg_lrus array for the node rather than a separate global
> > list, and then the node iteration code all starts to look like this:
> >
> >         nlru = &lru->node[nid];
> >         for_each_mem_cgroup(memcg) {
> >                 spin_lock(&nlru->lock);
> >                 isolated += __list_lru_walk_one(nlru, memcg,
> >                                                 isolate, cb_arg,
> >                                                 nr_to_walk);
> >                 spin_unlock(&nlru->lock);
> >                 if (*nr_to_walk <= 0)
> >                         break;
> >
> >                 /* non-memcg aware lists only have a root memcg list */
> >                 if (!list_lru_memcg_aware(lru))
> >                         break;
> >         }
> >
> > Hence for the walks in the !list_lru_memcg_aware(lru) case, the
> > list_lru_from_memcg() call in __list_lru_walk_one() always returns
> > just the root list. This makes everything use the same iteration
> > interface, and when you configure out memcg then we simply code the
> > the iterator to run once and list_lru_from_memcg() always returns
> > the one list...
> >
> 
> Agree with you that it is a better architecture, and I also want to
> change it like this.
> That would be more clear and easier for maintiance.
> But it requires lots of code changes, should we address it in another
> separate patchset ?

Yes. I think this is a separate piece of work as it spans much more
than just the list-lru infrastructure.

> > And, FWIW, I noticed that the list_lru memcg code assumes we only
> > ever put objects from memcg associated slab pages in the list_lru.
> > It calls memcg_from_slab_page() which makes no attempt to verify the
> > page is actually a slab page. That's a landmine just waiting to get
> > boom - list-lru can store any type of object the user wants, not
> > just slab pages. e.g. the binder code stores pages in the list-lru,
> > not slab objects, and so the only reason it doesn't go boom is that
> > the lru-list is not configured to be memcg aware....
> >
> 
> Another new issue.
> I will try to dignose what hiddened in this landmine is, and after I
> understand it clearly I will submit a new patchset.

The problem is memcg_from_slab_page() uses page->slab_cache directly
to retreive the owner memcg without first checking the
PageSlab(page) flag. If it's not a slab page, we need to get the
memcg from page->memcg, not page->slab_cache->memcg_params.memcg.

see page_cgroup_ino() for how to safely get the owner memcg from a
random page of unknown type...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux