On 11/01/2010 03:43 PM, Mandeep Singh Baines wrote:
Yes, this prevents you from reclaiming the active list all at once. But if the memory pressure doesn't go away, you'll start to reclaim the active list little by little. First you'll empty the inactive list, and then you'll start scanning the active list and pulling pages from inactive to active. The problem is that there is no minimum time limit to how long a page will sit in the inactive list before it is reclaimed. Just depends on scan rate which does not depend on time. In my experiments, I saw the active list get smaller and smaller over time until eventually it was only a few MB at which point the system came grinding to a halt due to thrashing.
I believe that changing the active/inactive ratio has other potential thrashing issues. Specifically, when the inactive list is too small, pages may not stick around long enough to be accessed multiple times and get promoted to the active list, even when they are in active use. I prefer a more flexible solution, that automatically does the right thing. The problem you see is that the file list gets reclaimed very quickly, even when it is already very small. I wonder if a possible solution would be to limit how fast file pages get reclaimed, when the page cache is very small. Say, inactive_file * active_file < 2 * zone->pages_high ? At that point, maybe we could slow down the reclaiming of page cache pages to be significantly slower than they can be refilled by the disk. Maybe 100 pages a second - that can be refilled even by an actual spinning metal disk without even the use of readahead. That can be rounded up to one batch of SWAP_CLUSTER_MAX file pages every 1/4 second, when the number of page cache pages is very low. This way HPC and virtual machine hosting nodes can still get rid of totally unused page cache, but on any system that actually uses page cache, some minimal amount of cache will be protected under heavy memory pressure. Does this sound like a reasonable approach? I realize the threshold may have to be tweaked... The big question is, how do we integrate this with the OOM killer? Do we pretend we are out of memory when we've hit our file cache eviction quota and kill something? Would there be any downsides to this approach? Are there any volunteers for implementing this idea? (Maybe someone who needs the feature?) -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>