Re: [PATCH V5 5/5] mm: memcg discount pages under softlimit from per-zone reclaimable_pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 19, 2012 at 5:05 AM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> On Mon, Jun 18, 2012 at 09:47:31AM -0700, Ying Han wrote:
>> The function zone_reclaimable() marks zone->all_unreclaimable based on
>> per-zone pages_scanned and reclaimable_pages. If all_unreclaimable is true,
>> alloc_pages could go to OOM instead of getting stuck in page reclaim.
>
> There is no zone->all_unreclaimable at this point, you removed it in
> the previous patch.
>
>> In memcg kernel, cgroup under its softlimit is not targeted under global
>> reclaim. So we need to remove those pages from reclaimable_pages, otherwise
>> it will cause reclaim mechanism to get stuck trying to reclaim from
>> all_unreclaimable zone.
>
> Can't you check if zone->pages_scanned changed in between reclaim
> runs?
>
> Or sum up the scanned and reclaimable pages encountered while
> iterating the hierarchy during regular reclaim and then use those
> numbers in the equation instead of the per-zone counters?
>
> Walking the full global hierarchy in all the places where we check if
> a zone is reclaimable is a scalability nightmare.

One way to solve this is to record the per-zone reclaimable pages (
sum of reclaimable pages of memcg above softlimits ) after each
shrink_zone(). The later function does walk the memcg hierarchy and
also checks the softlimit, so we don't need to do it again. The new
value pages_reclaimed is recorded per-zone, and the caller side could
use that to compare w/ zone->pages_scanned.

While I run tests on the patch, it turns out that I can not reproduce
the problem ( machine hang while over-committing the softlimit) even
w/o the patch. Then I realize that the problem only exist in the
internal version we don't have the check "sc->priority < DEF_PRIORITY
- 2" to bypass softlimit check. The reason we did that part is to
guarantee no global pressure on high priority memcgs.  So In that
case, global reclaim can never steal any pages from any memgs and the
system can easily hang.

This is not the case in the version I am posting here. The patch
guarantees not looping in memcgs all under softlimit by :
1. detects whether no memcg above their softlimit, if so, skip
checking softlimit
2. only check softlimit memcg if priority is >= DEF_PRIORITY - 2

In summary, the problem described in this patch doesn't exist. So I am
thinking to drop this one on my next post. Please comment.

--Ying

>> @@ -100,18 +100,36 @@ static __always_inline enum lru_list page_lru(struct page *page)
>>       return lru;
>>  }
>>
>> +static inline unsigned long get_lru_size(struct lruvec *lruvec,
>> +                                      enum lru_list lru)
>> +{
>> +     if (!mem_cgroup_disabled())
>> +             return mem_cgroup_get_lru_size(lruvec, lru);
>> +
>> +     return zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru);
>> +}
>> +
>>  static inline unsigned long zone_reclaimable_pages(struct zone *zone)
>>  {
>> -     int nr;
>> +     int nr = 0;
>> +     struct mem_cgroup *memcg;
>> +
>> +     memcg = mem_cgroup_iter(NULL, NULL, NULL);
>> +     do {
>> +             struct lruvec *lruvec = mem_cgroup_zone_lruvec(zone, memcg);
>>
>> -     nr = zone_page_state(zone, NR_ACTIVE_FILE) +
>> -          zone_page_state(zone, NR_INACTIVE_FILE);
>> +             if (should_reclaim_mem_cgroup(memcg)) {
>> +                     nr += get_lru_size(lruvec, LRU_INACTIVE_FILE) +
>> +                           get_lru_size(lruvec, LRU_ACTIVE_FILE);
>
> Sometimes, the number of reclaimable pages DO include those of groups
> for which should_reclaim_mem_cgroup() is false: when the priority
> level is <= DEF_PRIORITY - 2, as you defined in 1/5!  This means that
> you consider pages you just scanned unreclaimable, which can result in
> the zone being unreclaimable after the DEF_PRIORITY - 2 cycle, no?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]