On Mon 06-08-12 10:27:25, Rik van Riel wrote: > On 08/06/2012 10:03 AM, Michal Hocko wrote: > >On Wed 01-08-12 16:10:32, Rik van Riel wrote: > >>On 08/01/2012 03:04 PM, Ying Han wrote: > >> > >>>That is true. Hmm, then two things i can do: > >>> > >>>1. for kswapd case, make sure not counting the root cgroup > >>>2. or check nr_scanned. I like the nr_scanned which is telling us > >>>whether or not the reclaim ever make any attempt ? > >> > >>I am looking at a more advanced case of (3) right > >>now. Once I have the basics working, I will send > >>you a prototype (that applies on top of your patches) > >>to play with. > >> > >>Basically, for every LRU in the system, we can keep > >>track of 4 things: > >>- reclaim_stat->recent_scanned > >>- reclaim_stat->recent_rotated > >>- reclaim_stat->recent_pressure > >>- LRU size > >> > >>The first two represent the fraction of pages on the > >>list that are actively used. The larger the fraction > >>of recently used pages, the more valuable the cache > >>is. The inverse of that can be used to show us how > >>hard to reclaim this cache, compared to other caches > >>(everything else being equal). > >> > >>The recent pressure can be used to keep track of how > >>many pages we have scanned on each LRU list recently. > >>Pressure is scaled with LRU size. > >> > >>This would be the basic formula to decide which LRU > >>to reclaim from: > >> > >> recent_scanned LRU size > >>score = -------------- * ---------------- > >> recent_rotated recent_pressure > >> > >> > >>In other words, the less the objects on an LRU are > >>used, the more we should reclaim from that LRU. The > >>larger an LRU is, the more we should reclaim from > >>that LRU. > > > >The formula makes sense but I am afraid that it will be hard to tune it > >into something that wouldn't regress. For example I have seen workloads > >which had many small groups which are used to wrap up backup jobs and > >those are scanned a lot, you would see also many rotations because of > >the writeback but those are definitely good to scan rather than a large > >group which needs to keep its data resident. > > Writeback rotations are not counted in > lruvec->reclaim_stat->recent_rotated - only the rotations > that were done because we really want to keep the page are > counted. OK. I missed that. > >Anyway, I am not saying the score approach is a bad idea but I am afraid > >it will be hard to validate and make it right. > > One thing about the recent_scanned / recent_rotated metric is > that we have been using it since 2.6.28, to balance between > scanning the file and anonymous LRUs. > > I believe it would help us balance between multiple sets of > LRUs, too. > > >>The more we have already scanned an LRU, the lower > >>its score becomes. At some point, another LRU will > >>have the top score, and that will be the target to > >>scan. > > > >So you think we shouldn't do the full round over memcgs in shrink_zone a > >and rather do it oom way to pick up a victim and hammer it? > > Not hammer it too far. Only until its score ends up well > below (25% lower?) than that of the second highest scoring > list. > > That way all the lists get hammered a little bit, in turn. How do we provide the soft limit guarantee then? [...] -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>