Re: [PATCH 1/3] memcg: rework softlimit reclaim

Ying Han <yinghan@xxxxxxxxxx> · Wed, 7 Dec 2011 09:39:26 -0800

On Tue, Dec 6, 2011 at 6:13 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> On Tue,  6 Dec 2011 15:59:57 -0800
> Ying Han <yinghan@xxxxxxxxxx> wrote:
>
>> Under the shrink_zone, we examine whether or not to reclaim from a memcg
>> based on its softlimit. We skip scanning the memcg for the first 3 priority.
>> This is to balance between isolation and efficiency. we don't want to halt
>> the system by skipping memcgs with low-hanging fruits forever.
>>
>> Another change is to set soft_limit_in_bytes to 0 by default. This is needed
>> for both functional and performance:
>>
>> 1. If soft_limit are all set to MAX, it wastes first three periority iterations
>> without scanning anything.
>>
>> 2. By default every memcg is eligibal for softlimit reclaim, and we can also
>> set the value to MAX for special memcg which is immune to soft limit reclaim.
>>
>
> Could you update softlimit doc ?

Will do .
>
>
>
>> Signed-off-by: Ying Han <yinghan@xxxxxxxxxx>
>> ---
>>  include/linux/memcontrol.h |    7 ++++
>>  kernel/res_counter.c       |    1 -
>>  mm/memcontrol.c            |    8 +++++
>>  mm/vmscan.c                |   67 ++++++++++++++++++++++++++-----------------
>>  4 files changed, 55 insertions(+), 28 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index 81aabfb..53d483b 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -107,6 +107,8 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *,
>>                                  struct mem_cgroup_reclaim_cookie *);
>>  void mem_cgroup_iter_break(struct mem_cgroup *, struct mem_cgroup *);
>>
>> +bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *);
>> +
>>  /*
>>   * For memory reclaim.
>>   */
>> @@ -293,6 +295,11 @@ static inline void mem_cgroup_iter_break(struct mem_cgroup *root,
>>  {
>>  }
>>
>> +static inline bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *mem)
>> +{
>> +     return true;
>> +}
>> +
>>  static inline int mem_cgroup_get_reclaim_priority(struct mem_cgroup *memcg)
>>  {
>>       return 0;
>> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
>> index b814d6c..92afdc1 100644
>> --- a/kernel/res_counter.c
>> +++ b/kernel/res_counter.c
>> @@ -18,7 +18,6 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent)
>>  {
>>       spin_lock_init(&counter->lock);
>>       counter->limit = RESOURCE_MAX;
>> -     counter->soft_limit = RESOURCE_MAX;
>>       counter->parent = parent;
>>  }
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 4425f62..7c6cade 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -926,6 +926,14 @@ out:
>>  }
>>  EXPORT_SYMBOL(mem_cgroup_count_vm_event);
>>
>> +bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *mem)
>> +{
>> +     if (mem_cgroup_disabled() || mem_cgroup_is_root(mem))
>> +             return true;
>> +
>> +     return res_counter_soft_limit_excess(&mem->res) > 0;
>> +}
>> +
>>  /**
>>   * mem_cgroup_zone_lruvec - get the lru list vector for a zone and memcg
>>   * @zone: zone of the wanted lruvec
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 0ba7d35..b36d91b 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -2091,6 +2091,17 @@ restart:
>>       throttle_vm_writeout(sc->gfp_mask);
>>  }
>>
>> +static bool should_reclaim_mem_cgroup(struct scan_control *sc,
>> +                                   struct mem_cgroup *mem,
>> +                                   int priority)
>> +{
>> +     if (!global_reclaim(sc) || priority <= DEF_PRIORITY - 3 ||
>> +                     mem_cgroup_soft_limit_exceeded(mem))
>> +             return true;
>> +
>> +     return false;
>> +}
>> +
>
> Why "priority <= DEF_PRIORTY - 3" is selected ?
> It seems there is no reason. Could you justify this check ?

There is no particular reason for this magic "3". And the plan is to
open for further tuning later after seeing real problems.

The idea here is to balance out the performance vs isolation. We don't
want to keep trying on "over softlimit memcgs" with hard to reclaim
memory while leaving the "under softlimit memcg" with low-hanging
fruit behind. This hurts the system performance as a whole.

>
> Thinking briefly, can't we caluculate the ratio as
>
>        number of pages in reclaimable memcg / number of reclaimable pages
>
> And use 'priorty' ? If
>
> total_reclaimable_pages >> priority > number of pages in reclaimabe memcg
>
> memcg under softlimit should be scanned..then, we can avoid scanning pages
> twice.

Another thing we were talking about during summit is to reclaim the
pages proportionally based on how much each memcg exceeds its
softlimit, and the calculation above seems to be related to that.

I am pretty sure that we will tune the way to select memcg to reclaim
and how much to relciam while start running into problems, and there
are different ways to tune it. This patch is the very first step to
get started and the main purpose is to get rid of the big giant old
softlimit reclaim implementation.

 > Hmm, please give reason of the magic value here, anyway.
>
>>  static void shrink_zone(int priority, struct zone *zone,
>>                       struct scan_control *sc)
>>  {
>> @@ -2108,7 +2119,9 @@ static void shrink_zone(int priority, struct zone *zone,
>>                       .zone = zone,
>>               };
>>
>> -             shrink_mem_cgroup_zone(priority, &mz, sc);
>> +             if (should_reclaim_mem_cgroup(sc, memcg, priority))
>> +                     shrink_mem_cgroup_zone(priority, &mz, sc);
>> +
>>               /*
>>                * Limit reclaim has historically picked one memcg and
>>                * scanned it with decreasing priority levels until
>> @@ -2152,8 +2165,8 @@ static bool shrink_zones(int priority, struct zonelist *zonelist,
>>  {
>>       struct zoneref *z;
>>       struct zone *zone;
>> -     unsigned long nr_soft_reclaimed;
>> -     unsigned long nr_soft_scanned;
>> +//   unsigned long nr_soft_reclaimed;
>> +//   unsigned long nr_soft_scanned;
>
> Why do you leave these things ?

I steal this idea from Johannes's last posted softlimit rework patch.
My understanding is to make
the bisect easier later, maybe I am wrong.

> Hmm, but the whole logic seems clean to me except for magic number.

Thanks.

--Ying

>
> Thanks,
> -Kame
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href