On Wed 01-08-12 16:10:32, Rik van Riel wrote: > On 08/01/2012 03:04 PM, Ying Han wrote: > > >That is true. Hmm, then two things i can do: > > > >1. for kswapd case, make sure not counting the root cgroup > >2. or check nr_scanned. I like the nr_scanned which is telling us > >whether or not the reclaim ever make any attempt ? > > I am looking at a more advanced case of (3) right > now. Once I have the basics working, I will send > you a prototype (that applies on top of your patches) > to play with. > > Basically, for every LRU in the system, we can keep > track of 4 things: > - reclaim_stat->recent_scanned > - reclaim_stat->recent_rotated > - reclaim_stat->recent_pressure > - LRU size > > The first two represent the fraction of pages on the > list that are actively used. The larger the fraction > of recently used pages, the more valuable the cache > is. The inverse of that can be used to show us how > hard to reclaim this cache, compared to other caches > (everything else being equal). > > The recent pressure can be used to keep track of how > many pages we have scanned on each LRU list recently. > Pressure is scaled with LRU size. > > This would be the basic formula to decide which LRU > to reclaim from: > > recent_scanned LRU size > score = -------------- * ---------------- > recent_rotated recent_pressure > > > In other words, the less the objects on an LRU are > used, the more we should reclaim from that LRU. The > larger an LRU is, the more we should reclaim from > that LRU. The formula makes sense but I am afraid that it will be hard to tune it into something that wouldn't regress. For example I have seen workloads which had many small groups which are used to wrap up backup jobs and those are scanned a lot, you would see also many rotations because of the writeback but those are definitely good to scan rather than a large group which needs to keep its data resident. Anyway, I am not saying the score approach is a bad idea but I am afraid it will be hard to validate and make it right. > The more we have already scanned an LRU, the lower > its score becomes. At some point, another LRU will > have the top score, and that will be the target to > scan. So you think we shouldn't do the full round over memcgs in shrink_zone a and rather do it oom way to pick up a victim and hammer it? > We can adjust the score for different LRUs in different > ways, eg.: > - swappiness adjustment for file vs anon LRUs, within > an LRU set > - if an LRU set contains a file LRU with more inactive > than active pages, reclaim from this LRU set first > - if an LRU set is over it's soft limit, reclaim from > this LRU set first maybe we could replace LRU size by (LRU size - soft_limit) in the above formula? > > This also gives us a nice way to balance memory pressure > between zones, etc... -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>