On 10.09.20 09:32, Michal Hocko wrote: > [Cc Vlastimil and Mel - the whole email thread starts > http://lkml.kernel.org/r/20200902180628.4052244-1-zi.yan@xxxxxxxx > but this particular subthread has diverged a bit and you might find it > interesting] > > On Wed 09-09-20 15:43:55, David Hildenbrand wrote: >> On 09.09.20 15:19, Rik van Riel wrote: >>> On Wed, 2020-09-09 at 09:04 +0200, Michal Hocko wrote: >>>> On Tue 08-09-20 10:41:10, Rik van Riel wrote: >>>>> On Tue, 2020-09-08 at 16:35 +0200, Michal Hocko wrote: >>>>> >>>>>> A global knob is insufficient. 1G pages will become a very >>>>>> precious >>>>>> resource as it requires a pre-allocation (reservation). So it >>>>>> really >>>>>> has >>>>>> to be an opt-in and the question is whether there is also some >>>>>> sort >>>>>> of >>>>>> access control needed. >>>>> >>>>> The 1GB pages do not require that much in the way of >>>>> pre-allocation. The memory can be obtained through CMA, >>>>> which means it can be used for movable 4kB and 2MB >>>>> allocations when not >>>>> being used for 1GB pages. >>>> >>>> That CMA has to be pre-reserved, right? That requires a >>>> configuration. >>> >>> To some extent, yes. >>> >>> However, because that pool can be used for movable >>> 4kB and 2MB >>> pages as well as for 1GB pages, it would be easy to just set >>> the size of that pool to eg. 1/3 or even 1/2 of memory for every >>> system. >>> >>> It isn't like the pool needs to be the exact right size. We >>> just need to avoid the "highmem problem" of having too little >>> memory for kernel allocations. >>> >> >> I am not sure I like the trend towards CMA that we are seeing, reserving >> huge buffers for specific users (and eventually even doing it >> automatically). >> >> What we actually want is ZONE_MOVABLE with relaxed guarantees, such that >> anybody who requires large, unmovable allocations can use it. >> >> I once played with the idea of having ZONE_PREFER_MOVABLE, which >> a) Is the primary choice for movable allocations >> b) Is allowed to contain unmovable allocations (esp., gigantic pages) >> c) Is the fallback for ZONE_NORMAL for unmovable allocations, instead of >> running out of memory > > I might be missing something but how can this work longterm? Or put in > another words why would this work any better than existing fragmentation > avoidance techniques that page allocator implements already - movability > grouping etc. Please note that I am not deeply familiar with those but > my high level understanding is that we already try hard to not mix > movable and unmovable objects in same page blocks as much as we can. Note that we group in pageblock granularity, which avoids fragmentation on a pageblock level, not on anything bigger than that. Especially MAX_ORDER - 1 pages (e.g., on x86-64) and gigantic pages. So once you run for some time on a system (especially thinking about page shuffling *within* a zone), trying to allocate a gigantic page will simply always fail - even if you always had plenty of free memory in your single zone. > > My suspicion is that a separate zone would work in a similar fashion. As > long as there is a lot of free memory then zone will be effectively > MOVABLE. Similar applies to normal zone when unmovable allocations are Note the difference to MOVABLE: if you really want, you *can* put movable allocations into that zone. So you can happily allocate gigantic pages from it. Or anything else you like. As the name suggests "prefer movable allocations". > in minority. As long as the Normal zone gets full of unmovable objects > they start overflowing to ZONE_PREFER_MOVABLE and it will resemble page > block stealing when unmovable objects start spreading over movable page > blocks. Right, the long-term goal would be 1. To limit the chance of that happening. (e.g., size it in a way that's safe for 99.9% of all setups, resize dynamically on demand) 2. To limit the physical area where that is happening (e.g., find lowest possible pageblock etc.). That's more tricky but I consider this a pure optimization on top. As long as we stay in safe zone boundaries you get a benefit in most scenarios. As soon as we would have a (temporary) workload that would require more unmovable allocations we would fallback to polluting some pageblocks only. > > Again, my level of expertise to page allocator is quite low so all the > above might be simply wrong... Same over here. I had this idea in my mind for quite a while but obviously didn't get to figure out the details/implement yet - that's why I decided to share the basic idea just now. -- Thanks, David / dhildenb