[LSF/MM TOPIC] memory compaction and anti-fragmentation

Vlastimil Babka <vbabka@xxxxxxx> · Mon, 15 Dec 2014 11:11:59 +0100

Hi,

this topic still looks like far from a solved problem. It's also a natural
complement to the topics of THP and CMA.

When we discussed compaction last LSF/MM [1] we noted presence of bugs, and
overhead. Both was improved since, but still today we find quite old bugs, deal
with reports of excessive overhead, and the success rates are also still not great.

So here's a list of compaction subtopics/questions that I think could be discussed.

- As it turns out, surprising behavior can still show up in compaction code, and
often we don't have a good idea about what it's really doing. I've been using
ad-hoc (and ugly) tracepoints locally for specific issues, Joonsoo has recently
posted more polished set of tracepoints. Is this enough, or do we need more
tracepoints or vmstat entries? What about postprocessing of the traces, should
there be shared tools?

- For testing, which benchmarks to use? I (and others) have been relying on
stress-highalloc from mmtests, but I'm aware it's quite artificial, and results
could thus be potentially misleading. Is there anything better representative,
but doesn't need hours to run for a single data point?

- How to better decide when to try compaction and for how long? Is the deferred
compaction mechanism enough? Given how we've reduced the amount of synchronous
compaction compared to asynchronous, it's possible that deferred compaction is
not triggered enough. For asynchronous compaction we currently quit when we
detect lock contention or need_resched(). We have briefly discussed on linux-mm
with David Rientjes whether this makes sense and if instead there shouldn't be a
limit on the number of scanned pages per invocation? User or automatically
tunable perhaps?

- Can we improve coordination between direct reclaim and compaction? Both rely
mostly on watermark checks and estimation of fragmentation to decide whether to
reclaim or compact. Within compaction itself, the checks were found to be
inconsistent due to important parameters (alloc_flags and classzone_idx) not
available, which should now be fixed. But they are still missing in the reclaim
vs compaction decisions. This could be a problem in near-full-memory situations.
Can we also do something about parallel activity changing the conditions during
the compaction? E.g. we decide we have enough free memory to try compaction, but
then another process allocates it...

- Is the fundamental compaction algorithm sufficient? Migration scanner starts
at the zone beginning, free scanner at the zone end. Testing shows that with
memory nearly full, they always meet somewhere around the middle of the zone.
But that means the migration scanner never sees the second half of the zone, and
won't migrate movable pages from unmovable pageblocks, which impacts
fragmentation avoidance. Should we try to somehow move the scanner starting
points around the zone so all pageblocks get the same share of migrate scanner
on average?

Complementary to compaction is the fragmentation avoidance mechanism, which
Joonsoo and I are now also looking at. It's of course a heuristic and cannot be
perfect, unless it could predict the future. But can we do better to prevent
long-term unmovable allocations from polluting more pageblocks than needed?

- Should we perhaps sometimes decide that it's better to try migrating movable
pages out of current unmovable pageblocks, than placing an unmovable allocation
to movable pageblock?

- Would it be useful to introduce another migratetype e.g. MIXED, to mark
pageblocks where unmovable allocations occured, but didn't steal enough free
pages to change pageblock migratetype to unmovable? The idea is that further
stealing would prefer MIXED pageblocks before polluting another clean movable
pageblocks.

In case anyone's interested in more details, I've also written about the work
done on this topic (mostly from my perspective) during the last year, for our
SUSE Labs conference this September [2].

Vlastimil

[1] http://lwn.net/Articles/591998/
[2] http://labs.suse.cz/vbabka/compaction.pdf

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>