On 08/06/2012 05:18 PM, Ying Han wrote:
On Mon, Aug 6, 2012 at 11:51 AM, Rik van Riel<riel@xxxxxxxxxx> wrote:
On 08/06/2012 11:11 AM, Michal Hocko wrote:
On Mon 06-08-12 10:27:25, Rik van Riel wrote:
So you think we shouldn't do the full round over memcgs in shrink_zone a
and rather do it oom way to pick up a victim and hammer it?
Not hammer it too far. Only until its score ends up well
below (25% lower?) than that of the second highest scoring
list.
That way all the lists get hammered a little bit, in turn.
How do we provide the soft limit guarantee then?
[...]
The easiest way would be to find the top 2 or 3 scoring memcgs
when we reclaim memory. After reclaiming some pages, recalculate
the scores of just these top lists, and see if the list we started
out with now has a lower score than the second one.
Once we have reclaimed some from each of the 2 or 3 lists, we can
go back and find the highest priority lists again.
Sounds like quite a lot of calculation to pick which memcg to reclaim
from, and I wonder if that is necessary at all.
For most of the use cases, we don't need to pick the lowest score
memcg to reclaim from first. My understanding is that if we can
respect the (lru_size - softlimit) to calculate the nr_to_scan, that
is good move from what we have today.
If so, can we just still do the round-robin fashion in shrink_zone()
and for each memcg, we calculate the nr_to_scan similar to
get_scan_count() what have today but w/ the new formula. For memcg
under its softlimit, we avoid reclaim pages unless no more pages can
be reclaimed, and then we start reclaiming under the softlimit. That
part can use the same logic depending on (softlimit - lru_size)
If we do the round robin, we will not know in advance whether
or not there are memcgs over (or under) the softlimit.
Another thing to consider is that the round robin code will
always iterate over the cgroups AND try to reclaim a little
from every one of them.
The first version of my code will just iterate over them to
pick the highest priority cgroups, and will then reclaim from
the that (or those) groups. This is less work than what your
code does right now.
In the future, we can find a way to sort the cgroups (in a
tree?), so we do not have to walk over all of them.
Some workloads have thousands of cgroups on a system.
Iterating over all of them is not going to scale, it will
be an inconvenience when just calculating the priority,
and has the potential to be a total disaster when doing a
little bit of reclaim from every one of them.
Lets look at this one step at a time.
--
All rights reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>