On 10/7/21 00:53, Kent Overstreet wrote: > So I have some observations on memory compaction & hugepages. > > Right now, the working assumption in MM is that compaction is hard and > expensive, and right now it is - because most allocations are order 0, with a > small subset being hugepage order allocations. This means any time we need a > hugepage, compaction has to move a bunch of order 0 pages around, and memory > reclaim is no help here - when we reclaim memory, it's coming back as fragmented > order 0 pages. > > But what if compaction wasn't such a difficult, expensive operation? > > With folios, and then folios for anonymous pages, we won't see nearly so many > order 0 allocations anymore - we'll see a spread of allocation sizes based on a > mixture of application usage patterns - something much closer to a poisson > distribution, vs. our current very bimodal distribution. And since we won't be > fragmenting all our allocations up front, memory reclaim will be freeing > allocations in this same distribution. Unfortunately, the main problem with compaction is not the act of moving a number of LRU pages, but rather the presence of unmovable pages (slab, page tables and whatnot kernel allocations), where such a single page makes the whole 2MB block unusable. So I don't expect this would help dramatically for compaction, but the points added by Matthew would still apply. > Which means that any time an order n allocation fails, it's likely that we'll > still have order n-1 pages free - and of those free order n-1 pages, one will > likely have a buddy that's moveable and hasn't been fragmented - meaning the > common case is that compaction will have to move _one_ (higher order) page - > we'll almost never be having to move a bunch of 4k pages. > > Another way of thinking of this is that memory reclaim will be doing most of the > work that compaction has to do now to allocate a high order page. Compaction > will go from an expensive, somewhat unreliable operation to one that mostly just > works - it's going to be _much_ less of a pain point. > > It may turn out that allocating hugepages still doesn't work as reliably as we'd > like - but folios are still a big help even when we can't allocate a 2MB page, > because we'll be able to fall back to an order 6 or 7 or 8 allocation, which is > something we can't do now. And, since multiple CPU vendors now support > coalescing contiguous PTE entries in the TLB, this will still get us most of the > performance benefits of using hugepages. >