On Fri, Apr 20, 2012 at 2:17 AM, Michal Hocko <mhocko@xxxxxxx> wrote: > On Tue 17-04-12 09:38:02, Ying Han wrote: >> This patch reverts all the existing softlimit reclaim implementations and >> instead integrates the softlimit reclaim into existing global reclaim logic. >> >> The new softlimit reclaim includes the following changes: >> >> 1. add function should_reclaim_mem_cgroup() >> >> Add the filter function should_reclaim_mem_cgroup() under the common function >> shrink_zone(). The later one is being called both from per-memcg reclaim as >> well as global reclaim. >> >> Today the softlimit takes effect only under global memory pressure. The memcgs >> get free run above their softlimit until there is a global memory contention. >> This patch doesn't change the semantics. > > I am not sure I understand but I think it does change the semantics. > Previously we looked at a group with the biggest excess and reclaim that > group _hierarchically_. yes, we don't do _hierarchically_ reclaim reclaim in this patch. Hmm, that might be what Johannes insists to preserve on the other thread.... ? Now we do not care about hierarchy for soft > limit reclaim. Moreover we do kind-of soft reclaim even from hard limit > reclaim. Not yet. This patchset only do soft_limit reclaim under global reclaim. The logic here: > + if (target_mem_cgroup || priority <= DEF_PRIORITY - 3 || > + mem_cgroup_soft_limit_exceeded(memcg)) > + return true; If target_mem_cgroup != NULL, which is the target reclaim, we will always reclaim from the memcg. > >> Under the global reclaim, we skip reclaiming from a memcg under its softlimit. >> To prevent reclaim from trying too hard on hitting memcgs (above softlimit) w/ >> only hard-to-reclaim pages, the reclaim proirity is used to skip the softlimit >> check. This is a trade-off of system performance and resource isolation. >> >> 2. detect no memcgs above softlimit under zone reclaim. >> >> The function zone_reclaimable() marks zone->all_unreclaimable based on >> per-zone pages_scanned and reclaimable_pages. If all_unreclaimable is true, >> alloc_pages could go to OOM instead of getting stuck in page reclaim. >> >> In memcg kernel, cgroup under its softlimit is not targeted under global >> reclaim. It could be possible that all memcgs are under their softlimit for >> a particular zone. So the direct reclaim do_try_to_free_pages() will always >> return 1 which causes the caller __alloc_pages_direct_reclaim() enter tight >> loop. >> >> The reclaim priority check we put in should_reclaim_mem_cgroup() should help >> this case, but we still don't want to burn cpu cycles for first few priorities >> to get to that point. The idea is from LSF discussion where we detect it after >> the first round of scanning and restart the reclaim by not looking at softlimit >> at all. This allows us to make forward progress on shrink_zone() and free some >> pages on the zone. >> >> In order to do the detection for scanning all the memcgs under shrink_zone(), >> i have to change the mem_cgroup_iter() from shared walk to full walk. Otherwise, >> it would be very easy to skip lots of memcgs above softlimit and it causes the >> flag "ignore_softlimit" being mistakenly set. >> >> Signed-off-by: Ying Han <yinghan@xxxxxxxxxx> >> --- >> include/linux/memcontrol.h | 18 +-- >> include/linux/swap.h | 4 - >> mm/memcontrol.c | 397 +------------------------------------------- >> mm/vmscan.c | 113 +++++-------- >> 4 files changed, 55 insertions(+), 477 deletions(-) >> > [...] >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index 1a51868..a5f690b 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -2128,24 +2128,51 @@ restart: >> throttle_vm_writeout(sc->gfp_mask); >> } >> >> +static bool should_reclaim_mem_cgroup(struct mem_cgroup *target_mem_cgroup, >> + struct mem_cgroup *memcg, >> + int priority) >> +{ >> + /* Reclaim from mem_cgroup if any of these conditions are met: >> + * - This is a global reclaim This comment is wrong and confusing... My fault.. It should be "This is a target reclaim". >> + * - reclaim priority is higher than DEF_PRIORITY - 3 >> + * - mem_cgroup exceeds its soft limit >> + * >> + * The priority check is a balance of how hard to preserve the pages >> + * under softlimit. If the memcgs of the zone having trouble to reclaim >> + * pages above their softlimit, we have to reclaim under softlimit >> + * instead of burning more cpu cycles. >> + */ >> + if (target_mem_cgroup || priority <= DEF_PRIORITY - 3 || >> + mem_cgroup_soft_limit_exceeded(memcg)) >> + return true; >> + >> + return false; >> +} >> + >> static void shrink_zone(int priority, struct zone *zone, >> struct scan_control *sc) >> { >> struct mem_cgroup *root = sc->target_mem_cgroup; >> - struct mem_cgroup_reclaim_cookie reclaim = { >> - .zone = zone, >> - .priority = priority, >> - }; >> struct mem_cgroup *memcg; >> + int above_softlimit, ignore_softlimit = 0; >> + >> >> - memcg = mem_cgroup_iter(root, NULL, &reclaim); >> +restart: >> + above_softlimit = 0; >> + memcg = mem_cgroup_iter(root, NULL, NULL); > > I am afraid this will not work for hard-limit reclaim. We need the > cookie to remember the last memcg we were shrinking from the hierarchy > otherwise mem_cgroup_reclaim would hammer on the same group again and > again. Consider > A (hard limit 30M no pages) > |- B (10M) > \- C (20M) > > then we could easily end up in OOM, right? And the OOM would be for the > A group which probably doesn't have any processes in it so we will not > make any fwd. process. Err... For some reason I missed the mem_cgroup_iter_break() underneath. I have been imagining that we do walk the while hierarchy for hard_limit reclaim as well. Does it make more sense to walk the hierarchy under A if A hit's limit, instead of keep hitting one memcg w/ all priority levels ? --Ying > >> do { >> struct mem_cgroup_zone mz = { >> .mem_cgroup = memcg, >> .zone = zone, >> }; >> >> - shrink_mem_cgroup_zone(priority, &mz, sc); >> + if (ignore_softlimit || >> + should_reclaim_mem_cgroup(root, memcg, priority)) { >> + >> + shrink_mem_cgroup_zone(priority, &mz, sc); >> + above_softlimit = 1; >> + } >> + >> /* >> * Limit reclaim has historically picked one memcg and >> * scanned it with decreasing priority levels until >> @@ -2160,8 +2187,13 @@ static void shrink_zone(int priority, struct zone *zone, >> mem_cgroup_iter_break(root, memcg); >> break; >> } >> - memcg = mem_cgroup_iter(root, memcg, &reclaim); >> + memcg = mem_cgroup_iter(root, memcg, NULL); >> } while (memcg); >> + >> + if (!above_softlimit) { >> + ignore_softlimit = 1; >> + goto restart; >> + } >> } >> >> /* Returns true if compaction should go ahead for a high-order request */ > [...] > -- > Michal Hocko > SUSE Labs > SUSE LINUX s.r.o. > Lihovarska 1060/12 > 190 00 Praha 9 > Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href