[LSF/MM TOPIC][ATTEND]cold page tracking / working set estimation

Michel Lespinasse <walken@xxxxxxxxxx> · Fri, 4 Feb 2011 15:40:24 -0800

Google uses an automated system to assign compute jobs to individual
machines within a cluster. In order to improve memory utilization in
the cluster, this system collects memory utilization statistics for
each cgroup on each machine. The following properties are desired for
the working set estimation mechanism:

- Low impact on the normal MM algorithms - we don't want to stress the
VM just by enabling working set estimation;

- Collected statistics should be comparable across multiple machines -
we don't just want to know which cgroup to reclaim from on an
individual machine, we also need to know which machine is best to
target a job onto within a large cluster;

- Low, predictable CPU usage;

- Among cold pages, differentiate between these that are immediately
reclaimable and these that would require a disk write.

We use a very simple approach, scanning memory at a fixed rate and
identifying pages that haven't been touched in a number of scans. We
are currently switching from a fakenuma based implementation (which we
don't think is very upstreamable) to a memcg based one. We think this
could be of interest to the wider community & would like to discuss
requirement with other interested folks.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>