On 08/06/2012 10:03 AM, Michal Hocko wrote:
On Wed 01-08-12 16:10:32, Rik van Riel wrote:
On 08/01/2012 03:04 PM, Ying Han wrote:
That is true. Hmm, then two things i can do:
1. for kswapd case, make sure not counting the root cgroup
2. or check nr_scanned. I like the nr_scanned which is telling us
whether or not the reclaim ever make any attempt ?
I am looking at a more advanced case of (3) right
now. Once I have the basics working, I will send
you a prototype (that applies on top of your patches)
to play with.
Basically, for every LRU in the system, we can keep
track of 4 things:
- reclaim_stat->recent_scanned
- reclaim_stat->recent_rotated
- reclaim_stat->recent_pressure
- LRU size
The first two represent the fraction of pages on the
list that are actively used. The larger the fraction
of recently used pages, the more valuable the cache
is. The inverse of that can be used to show us how
hard to reclaim this cache, compared to other caches
(everything else being equal).
The recent pressure can be used to keep track of how
many pages we have scanned on each LRU list recently.
Pressure is scaled with LRU size.
This would be the basic formula to decide which LRU
to reclaim from:
recent_scanned LRU size
score = -------------- * ----------------
recent_rotated recent_pressure
In other words, the less the objects on an LRU are
used, the more we should reclaim from that LRU. The
larger an LRU is, the more we should reclaim from
that LRU.
The formula makes sense but I am afraid that it will be hard to tune it
into something that wouldn't regress. For example I have seen workloads
which had many small groups which are used to wrap up backup jobs and
those are scanned a lot, you would see also many rotations because of
the writeback but those are definitely good to scan rather than a large
group which needs to keep its data resident.
Writeback rotations are not counted in
lruvec->reclaim_stat->recent_rotated - only the rotations
that were done because we really want to keep the page are
counted.
Anyway, I am not saying the score approach is a bad idea but I am afraid
it will be hard to validate and make it right.
One thing about the recent_scanned / recent_rotated metric is
that we have been using it since 2.6.28, to balance between
scanning the file and anonymous LRUs.
I believe it would help us balance between multiple sets of
LRUs, too.
The more we have already scanned an LRU, the lower
its score becomes. At some point, another LRU will
have the top score, and that will be the target to
scan.
So you think we shouldn't do the full round over memcgs in shrink_zone a
and rather do it oom way to pick up a victim and hammer it?
Not hammer it too far. Only until its score ends up well
below (25% lower?) than that of the second highest scoring
list.
That way all the lists get hammered a little bit, in turn.
We can adjust the score for different LRUs in different
ways, eg.:
- swappiness adjustment for file vs anon LRUs, within
an LRU set
- if an LRU set contains a file LRU with more inactive
than active pages, reclaim from this LRU set first
- if an LRU set is over it's soft limit, reclaim from
this LRU set first
maybe we could replace LRU size by (LRU size - soft_limit) in the above
formula?
Good idea, that could work.
--
All rights reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>