On Fri, Jan 28, 2011 at 10:35:39AM +0000, Mel Gorman wrote: > I'd be ok with high+low as a starting point to solve the immediate > problem of way too much memory being free and then treat "kswapd must go > to sleep" as a separate problem. I'm less keen on 1% but only because it > could be too large a value. min(1%, low) sounds best to me. Because on the 4G system "low" is likely bigger than 1%. But really to me it sounds best to apply my first patch and stick to the high watermark and remove the gap. What is going on is dma zone and pci32 zones are at high+gap. over-4g zone is at "high". kswapd keeps running until all are above high. But as long as there's at least one not over high, the others are shrunk up to high+gap. The allocator is tought that it should try to always allocate from the over4g zone. And the over-4g zone is never below the "low" wmark because 100% of the cache is clean so kswapd keeps the normal and dma zones at high+gap and the over-4g zone at "high". In previous email you asked me how kswapd get stuck in D state and never stops working, and that it should stop earlier. This sounds impossible, kswapd behavior can't possibly change, simply there is less memory freed by lowering that "gap". Also you can make the gap as big as you want but it'll only make a difference the first time, then kswapd will stop shrinking normal and dma zone when they reach high+gap. Regardless of the gap size. So kswapd can't possibly change behavior and it can't possibly be in D state by just changing this "gap" size. Which is why I think the gap should be zero and I'd like my first patch to be applied. There's no point to waste ram for a feature that can't gaurantee we rotate the zone allocation. The balancing problem can't be solved in kswapd. It can only be solved in the allocator if you really aim to give more rotation to the lrus. As long as the "over4g" zone will be allocated first, at some point the lrus in the normal/dma zone will have to stop rotating. Either that or kswapd will shrink 100% of the ram in dma/normal zone which would destroy all the cache which is clearly wrong. And if you change the allocator to allocate in rotation from the 3 zones (clearly we would never want to allocate from the dma zone, so it's magic area here) there is absolutely no need of any "gap" in kswapd to keep the shrinking balanced. In short I think the zone balancing problem tackled in kswapd is wrong and kswapd should stick to the high wmark only, and if you care about zone balancing it should be done in the allocator only, then kswapd will cope with whatever the allocator decides just fine. I guess the LRU caching behavior of a 4g system with a little memory over 4g is going to be worse than if you boot with mem=4g and there's nothing kswapd can do about it as long as the allocator always grabs the new cache page from the highest zone. Clearly on a 64bit system allocating below 4g may be ok, but on 32bit system allocating in the normal zone below 800m must be absolutely avoided. So it's not simple problem. Personally I never liked per-zone lru because of this. But kswapd isn't the solution and it just wastes memory with no benefit possible except for the first 5sec when the free memory goes up from 170M to 700M and then it remains stuck at 700M while cp runs for another 2 hours to read all 500G of hd. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>