On Fri, Aug 17, 2012 at 4:41 PM, Rik van Riel <riel@xxxxxxxxxx> wrote: > On 08/17/2012 07:34 PM, Ying Han wrote: >> >> On Thu, Aug 16, 2012 at 8:37 AM, Rik van Riel <riel@xxxxxxxxxx> wrote: > > >>> + /* >>> + * Reclaim from the top scoring lruvec until we freed enough >>> + * pages, or its reclaim priority has halved. >>> + */ >>> + do { >>> + shrink_lruvec(victim_lruvec, sc); >>> + score = reclaim_score(memcg, victim_lruvec); >>> + } while (sc->nr_to_reclaim > 0 && score > max_score / 2); >> >> >> This would violate the user expectation of soft_limit badly, >> especially for background reclaim where nr_to_reclaim equals to >> ULONG_MAX. >> >> Here we keep hitting cgroup A and potentially push it down to >> softlimit until the score drops to certain level. It is bad since it >> causes "hot" memory (under softlimit) of A being reclaimed while other >> cgroups has plenty of "cold" (above softlimit) to give out. > > > Look at the function reclaim_score(). > > Once a group drops below its soft limit, its score will > be a factor 10000 smaller, making sure we hit the second > exit condition. > > After that, we will pick another group. > > >> In general, pick one cgroup to reclaim instead of round-robin is ok as >> long as we don't reclaim further down to the softlimit. The next >> question then is what's the next cgroup to reclaim if that doesn't >> give us enough. > > > Again, look at the function reclaim_score(). > > If there is a group above the softlimit, we pretty much > guarantee we will reclaim from that group. If any reclaim > will happen from another group, it will be absolutely > minimal (taking recent_pressure from 0 to SWAP_CLUSTER_MAX, > and then moving on to another group). Seems I should really look into the numbers, which i tried to avoid at the beginning... :( Another way of teaching myself on how it works is to run a sanity test. Let's say I have two cgroups under root, and they are running different workload: root ->A ( mem_alloc which keep touching its working set) ->B ( stream IO, like dd ) Here are the test cases on top of my head as well as the expected output, forget about root cgroup for now: case 1. A & B above softlimit a) score(B) > score(A), and keep reclaiming from B b) as long as usage(B) > softlimit(B), no reclaim on A c) until B under softlimit, reclaim from A case 2. A above softlimit and B under softlimit a) score(A) > score(B), and keep reclaiming from A b) as long as usage (A) > softlimit (A), no reclaim on B c) until A under softlimit, then reclaim on both as case 3 case 3. A & B under softlimit a) score(B) > score(A), and keep reclaiming from B b) there should be no reclaim happen on A. My patch delivers the functionality of case 2, but not distributing the pressure across memcgs as this patch does (case 1 & 3). Also, on case3 where in my patch I would scan all the memcgs for nothing where in this patch it will eventually pick a memcg to reclaim. Not sure if it is a lot save though. Over the three cases, I would say case 2 is the basic functionality we want to guarantee and the case 1 and case 3 are optimizations on top of that. I would like to run the test above and please help to clarify if they make sense. Thanks --Ying > > -- > All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>