On Tue, 2010-11-23 at 10:44 +0100, Peter Schüller wrote: > > You don't have anybody messing with /proc/sys/vm/drop_caches, do you? > > Highly unlikely given that (1) evictions, while often very > significant, are usually not *complete* (although the first graph > example I provided had a more or less complete eviction) and (2) the > evictions are not obviously periodic indicating some kind of cron job, > and (3) we see the evictions happening across a wide variety of > machines. > > So yes, I feel confident that we are not accidentally doing that. Yeah, drop_caches doesn't seem very likely. Your postgres data looks the cleanest and is probably the easiest to analyze. Might as well start there: http://files.spotify.com/memcut/postgresql_weekly.png As you said, it might not be the same as the others, but it's a decent place to start. If someone used drop_caches or if someone was randomly truncating files, we'd expect to see the active/inactive lines both drop by relatively equivalent amounts, and see them happen at _exactly_ the same time as the cache eviction. The eviction about 1/3 of the way through Wednesday in the above graph kinda looks this way, but it's the exception. Just eyeballing it, _most_ of the evictions seem to happen after some movement in the active/inactive lists. We see an "inactive" uptick as we start to launder pages, and the page activation doesn't keep up with it. This is a _bit_ weird since we don't see any slab cache or other users coming to fill the new space. Something _wanted_ the memory, so why isn't it being used? Do you have any large page (hugetlbfs) or other multi-order (> 1 page) allocations happening in the kernel? If you could start recording /proc/{vmstat,buddystat,meminfo,slabinfo}, it would be immensely useful. The munin graphs are really great, but they don't have the detail which you can get from stuff like vmstat. > Further, we have observed the kernel's unwillingness to retain data in > page cache under interesting circumstances: > > (1) page cache eviction happens > (2) we warm up our BDB files by cat:ing them (simple but effective) > (3) within a matter of minutes, while there is still several GB of > free (truly free, not page cached), these are evicted (as evidenced by > re-cat:ing them a little while later) > > This latest observation we understand may be due to NUMA related > allocation issues, and we should probably try to use numactl to ask > for a more even allocation. We have not yet tried this. However, it is > not clear how any issues having to do with that would cause sudden > eviction of data already *in* the page cache (on whichever node).. For a page-cache-heavy workload where you care a lot more about things being _in_ cache rather than having good NUMA locality, you probably want "zone_reclaim_mode" set to 0: http://www.kernel.org/doc/Documentation/sysctl/vm.txt That'll be a bit more comprehensive than messing with numactl. It really is the best thing if you just don't care about NUMA latencies all that much. What kind of hardware is this, btw? -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>