Re: [PATCH v2 4/5] mm: make memcg visible to lru walker isolation function

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 7, 2020 at 5:31 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> On Mon, Jan 06, 2020 at 10:41:22PM +0800, Yafang Shao wrote:
> > On Mon, Jan 6, 2020 at 8:17 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > > We can clean up the code a lot by getting rid of the unnecessary
> > > indenting by doing this:
> > >
> > >         /* iterate the global lru first */
> > >         isolated = list_lru_walk_one(lru, nid, NULL, isolate, cb_arg,
> > >                                         nr_to_walk);
> > >         if (!list_lru_memcg_aware(lru))
> > >                 return isolated;
> > >
> > >         nlru = &lru->node[nid];
> > >         for_each_mem_cgroup(memcg) {
> > >                 /* already scanned the root memcg above */
> > >                 if (is_root_memcg(memcg))
> > >                         continue;
> > >                 if (*nr_to_walk <= 0)
> > >                         break;
> > >                 spin_lock(&nlru->lock);
> > >                 isolated += __list_lru_walk_one(nlru, memcg,
> > >                                                 isolate, cb_arg,
> > >                                                 nr_to_walk);
> > >                 spin_unlock(&nlru->lock);
> > >         }
> > >         return isolated;
> > >
> >
> > That's eaiser to understand.
> > I wil change it like this in next version.
>
> Thanks!
>
> > >
> > > And so to architecture... This all makes me wonder why we still
> > > special case the memcg LRU lists here.
> >
> > Can't agree more.
> > The first time I read this code, I wondered why not assign an
> > non-negtive number to kmemcg_id of the root_mem_cgroup and then use
> > memcg_lrus as well.
>
> Yeah, I've been wondering why the ID is -1 instead of 0 when we
> have a global variable that stores the root memcg that we can
> compare pointers directly against via is_root_memcg(). all it seems
> to do is make things more complex by having to special case the root
> memcg....
>
> From that perspective, I do like your change to use the memcg
> iterator functions rather than a for loop over the range of indexes,
> but I'd much prefer to see this method used consistently everywhere
> rather than the way we've duplicated lots of code by tacking memcgs
> onto the side of the non-memcg code paths.
>

Agreed, that would be better.
I will think about it.

> > > Ever since we went to
> > > per-node memcg lru lists (~2015, iirc), there's been this special
> > > hidden hack for the root memcg to use the global list. and one that
> > > I have to read lots of code to remind myself it exists every time I
> > > have to did into this code again.
> > >
> > > I mean, if the list is not memcg aware, then it only needs a single
> > > list per node - the root memcg list. That could be easily stored in
> > > the memcg_lrus array for the node rather than a separate global
> > > list, and then the node iteration code all starts to look like this:
> > >
> > >         nlru = &lru->node[nid];
> > >         for_each_mem_cgroup(memcg) {
> > >                 spin_lock(&nlru->lock);
> > >                 isolated += __list_lru_walk_one(nlru, memcg,
> > >                                                 isolate, cb_arg,
> > >                                                 nr_to_walk);
> > >                 spin_unlock(&nlru->lock);
> > >                 if (*nr_to_walk <= 0)
> > >                         break;
> > >
> > >                 /* non-memcg aware lists only have a root memcg list */
> > >                 if (!list_lru_memcg_aware(lru))
> > >                         break;
> > >         }
> > >
> > > Hence for the walks in the !list_lru_memcg_aware(lru) case, the
> > > list_lru_from_memcg() call in __list_lru_walk_one() always returns
> > > just the root list. This makes everything use the same iteration
> > > interface, and when you configure out memcg then we simply code the
> > > the iterator to run once and list_lru_from_memcg() always returns
> > > the one list...
> > >
> >
> > Agree with you that it is a better architecture, and I also want to
> > change it like this.
> > That would be more clear and easier for maintiance.
> > But it requires lots of code changes, should we address it in another
> > separate patchset ?
>
> Yes. I think this is a separate piece of work as it spans much more
> than just the list-lru infrastructure.
>

Got it.

> > > And, FWIW, I noticed that the list_lru memcg code assumes we only
> > > ever put objects from memcg associated slab pages in the list_lru.
> > > It calls memcg_from_slab_page() which makes no attempt to verify the
> > > page is actually a slab page. That's a landmine just waiting to get
> > > boom - list-lru can store any type of object the user wants, not
> > > just slab pages. e.g. the binder code stores pages in the list-lru,
> > > not slab objects, and so the only reason it doesn't go boom is that
> > > the lru-list is not configured to be memcg aware....
> > >
> >
> > Another new issue.
> > I will try to dignose what hiddened in this landmine is, and after I
> > understand it clearly I will submit a new patchset.
>
> The problem is memcg_from_slab_page() uses page->slab_cache directly
> to retreive the owner memcg without first checking the
> PageSlab(page) flag. If it's not a slab page, we need to get the
> memcg from page->memcg, not page->slab_cache->memcg_params.memcg.
>
> see page_cgroup_ino() for how to safely get the owner memcg from a
> random page of unknown type...
>

Understood, many thanks for your explanation.


Thanks
Yafang



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux