Re: [PATCH 2/3] weight for memcg background reclaim (Was Re: [PATCH V6 00/10] memcg: per cgroup background reclaim

Ying Han <yinghan@xxxxxxxxxx> · Wed, 20 Apr 2011 23:59:52 -0700

On Wed, Apr 20, 2011 at 11:38 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:

On Wed, 20 Apr 2011 23:11:42 -0700

Ying Han <yinghan@xxxxxxxxxx> wrote:

> On Wed, Apr 20, 2011 at 8:48 PM, KAMEZAWA Hiroyuki <

> kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:

>

> >

> > memcg-kswapd visits each memcg in round-robin. But required

> > amounts of works depends on memcg' usage and hi/low watermark

> > and taking it into account will be good.

> >

> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>

> > ---

> >  include/linux/memcontrol.h |    1 +

> >  mm/memcontrol.c            |   17 +++++++++++++++++

> >  mm/vmscan.c                |    2 ++

> >  3 files changed, 20 insertions(+)

> >

> > Index: mmotm-Apr14/include/linux/memcontrol.h

> > ===================================================================

> > --- mmotm-Apr14.orig/include/linux/memcontrol.h

> > +++ mmotm-Apr14/include/linux/memcontrol.h

> > @@ -98,6 +98,7 @@ extern bool mem_cgroup_kswapd_can_sleep(

> >  extern struct mem_cgroup *mem_cgroup_get_shrink_target(void);

> >  extern void mem_cgroup_put_shrink_target(struct mem_cgroup *mem);

> >  extern wait_queue_head_t *mem_cgroup_kswapd_waitq(void);

> > +extern int mem_cgroup_kswapd_bonus(struct mem_cgroup *mem);

> >

> >  static inline

> >  int mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup

> > *cgroup)

> > Index: mmotm-Apr14/mm/memcontrol.c

> > ===================================================================

> > --- mmotm-Apr14.orig/mm/memcontrol.c

> > +++ mmotm-Apr14/mm/memcontrol.c

> > @@ -4673,6 +4673,23 @@ struct memcg_kswapd_work

> >

> >  struct memcg_kswapd_work       memcg_kswapd_control;

> >

> > +int mem_cgroup_kswapd_bonus(struct mem_cgroup *mem)

> > +{

> > +       unsigned long long usage, lowat, hiwat;

> > +       int rate;

> > +

> > +       usage = res_counter_read_u64(&mem->res, RES_USAGE);

> > +       lowat = res_counter_read_u64(&mem->res, RES_LOW_WMARK_LIMIT);

> > +       hiwat = res_counter_read_u64(&mem->res, RES_HIGH_WMARK_LIMIT);

> > +       if (lowat == hiwat)

> > +               return 0;

> > +

> > +       rate = (usage - hiwat) * 10 / (lowat - hiwat);

> > +       /* If usage is big, we reclaim more */

> > +       return rate * SWAP_CLUSTER_MAX;

This may be buggy and we should have upper limit on this 'rate'.

> > +}

> > +

> >

>

>

> > I understand the logic in general, which we would like to reclaim more each

> > time if more work needs to be done. But not quite sure the calculation here,

> > the (usage - hiwat) determines the amount of work of kswapd. And why divide

> > by (lowat - hiwat)? My guess is because the larger the value, the later we

> > will trigger kswapd?

>

Because memcg-kswapd will require more work on this memcg if usage-high is large.

agree on this, and that is the idea of "rate" be proportional to (usage-high).

At first, I'm not sure this logic is good but wanted to show there is a chance to

do some schedule.

We have 2 ways to implement this kind of weight

 1. modify to select memcg logic

    I think we'll see starvation easily. So, didn't this for this time.

 2. modify the amount to nr_to_reclaim

    We'll be able to determine the amount by some calculation using some statistics.

I selected "2" for this time.

With HIGH/LOW watermark, the admin set LOW watermark as a kind of limit. Then,

if usage is more than LOW watermark, its priority will be higher than other memcg

which has lower (relative) usage.

Ok, now i know a bit more of the logic behind. Here, we would like to reclaim more from the memcg which has higher (usage - low).

n general, memcg-kswapd can reduce memory down to high watermak only when the system is not busy. So, this logic tries to remove more memory from busy cgroup to reduce 'hit limit'.

So, the "busy cgroup" here means the memcg has higher (usage - low)?

--Ying

And I wonder, a memcg containes pages which is related to each other. So, reducing

some amount of pages larger than 32pages at once may make sense.

Thanks,

-Kame