Google uses an automated system to assign compute jobs to individual machines within a cluster. In order to improve memory utilization in the cluster, this system collects memory utilization statistics for each cgroup on each machine. The following properties are desired for the working set estimation mechanism: - Low impact on the normal MM algorithms - we don't want to stress the VM just by enabling working set estimation; - Collected statistics should be comparable across multiple machines - we don't just want to know which cgroup to reclaim from on an individual machine, we also need to know which machine is best to target a job onto within a large cluster; - Low, predictable CPU usage; - Among cold pages, differentiate between these that are immediately reclaimable and these that would require a disk write. We use a very simple approach, scanning memory at a fixed rate and identifying pages that haven't been touched in a number of scans. We are currently switching from a fakenuma based implementation (which we don't think is very upstreamable) to a memcg based one. We think this could be of interest to the wider community & would like to discuss requirement with other interested folks. -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>