> There are three types of zones: > 1. The first four zones partition the physical address space of CPU > memory. > 2. The device zone provides interoperability between CPU and device > memory. > 3. The movable zone commonly represents a memory allocation policy. > > Though originally designed for memory hot removal, the movable zone is > instead widely used for other purposes, e.g., CMA and kdump kernel, on > platforms that do not support hot removal, e.g., Android and ChromeOS. > Nowadays, it is legitimately a zone independent of any physical > characteristics. In spite of being somewhat regarded as a hack, > largely due to the lack of a generic design concept for its true major > use cases (on billions of client devices), the movable zone naturally > resembles a policy (virtual) zone overlayed on the first four > (physical) zones. > > This proposal formally generalizes this concept as policy zones so > that additional policies can be implemented and enforced by subsequent > zones after the movable zone. An inherited requirement of policy zones > (and the first four zones) is that subsequent zones must be able to > fall back to previous zones and therefore must add new properties to > the previous zones rather than remove existing ones from them. Also, > all properties must be known at the allocation time, rather than the > runtime, e.g., memory object size and mobility are valid properties > but hotness and lifetime are not. > > ZONE_MOVABLE becomes the first policy zone, followed by two new policy > zones: > 1. ZONE_NOSPLIT, which contains pages that are movable (inherited from > ZONE_MOVABLE) and restricted to a minimum order to be > anti-fragmentation. The latter means that they cannot be split down > below that order, while they are free or in use. > 2. ZONE_NOMERGE, which contains pages that are movable and restricted > to an exact order. The latter means that not only is split > prohibited (inherited from ZONE_NOSPLIT) but also merge (see the > reason in Chapter Three), while they are free or in use. > > Since these two zones only can serve THP allocations (__GFP_MOVABLE | > __GFP_COMP), they are called THP zones. Reclaim works seamlessly and > compaction is not needed for these two zones. > > Compared with the hugeTLB pool approach, THP zones tap into core MM > features including: > 1. THP allocations can fall back to the lower zones, which can have > higher latency but still succeed. > 2. THPs can be either shattered (see Chapter Two) if partially > unmapped or reclaimed if becoming cold. > 3. THP orders can be much smaller than the PMD/PUD orders, e.g., 64KB > contiguous PTEs on arm64 [1], which are more suitable for client > workloads. > > Policy zones can be dynamically resized by offlining pages in one of > them and onlining those pages in another of them. Note that this is > only done among policy zones, not between a policy zone and a physical > zone, since resizing is a (software) policy, not a physical > characteristic. > > Implementing the same idea in the pageblock granularity has also been > explored but rejected at Google. Pageblocks have a finer granularity > and therefore can be more flexible than zones. The tradeoff is that > this alternative implementation was more complex and failed to bring a > better ROI. However, the rejection was mainly due to its inability to > be smoothly extended to 1GB THPs [2], which is a planned use case of > TAO. We did implement similar idea in the pageblock granularity on OPPO's phones by extending two special migratetypes[1]: * QUAD_TO_TRIP - this is mainly for 4-order mTHP allocation which can use ARM64's CONT-PTE; but can rarely be splitted into 3 order to dull the pain of 3-order allocation if and only if 3-order allocation has failed in both normal buddy and the below TRIP_TO_QUAD. * TRIP_TO_QUAD - this is mainly for 4-order mTHP allocation which can use ARM64's CONT-PTE; but can sometimes be splitted into 3 order to dull the pain of 3-order allocation if and only if 3-order allocation has failed in normal buddy. neither of above will be merged into 5 order or above; neither of above will be splitted into 2 order or lower. in compaction, we will skip both of above. I am seeing one disadvantage of this approach is that I have to add a separate LRU list in each zone to place those mTHP folios. if mTHP and small folios are put in the same LRU list, the reclamation efficiency is extremely bad. A separate zone, on the other hand, can avoid a separate LRU list for mTHP as the new zone has its own LRU list. [1] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplus/sm8650_u_14.0.0_oneplus12/mm/page_alloc.c > > [1] https://lore.kernel.org/20240215103205.2607016-1-ryan.roberts@xxxxxxx/ > [2] https://lore.kernel.org/20200928175428.4110504-1-zi.yan@xxxxxxxx/ > > Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx> Thanks Barry