Re: [Chapter One] THP zones: the use cases of policy zones

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> There are three types of zones:
> 1. The first four zones partition the physical address space of CPU
>    memory.
> 2. The device zone provides interoperability between CPU and device
>    memory.
> 3. The movable zone commonly represents a memory allocation policy.
> 
> Though originally designed for memory hot removal, the movable zone is
> instead widely used for other purposes, e.g., CMA and kdump kernel, on
> platforms that do not support hot removal, e.g., Android and ChromeOS.
> Nowadays, it is legitimately a zone independent of any physical
> characteristics. In spite of being somewhat regarded as a hack,
> largely due to the lack of a generic design concept for its true major
> use cases (on billions of client devices), the movable zone naturally
> resembles a policy (virtual) zone overlayed on the first four
> (physical) zones.
> 
> This proposal formally generalizes this concept as policy zones so
> that additional policies can be implemented and enforced by subsequent
> zones after the movable zone. An inherited requirement of policy zones
> (and the first four zones) is that subsequent zones must be able to
> fall back to previous zones and therefore must add new properties to
> the previous zones rather than remove existing ones from them. Also,
> all properties must be known at the allocation time, rather than the
> runtime, e.g., memory object size and mobility are valid properties
> but hotness and lifetime are not.
> 
> ZONE_MOVABLE becomes the first policy zone, followed by two new policy
> zones:
> 1. ZONE_NOSPLIT, which contains pages that are movable (inherited from
>    ZONE_MOVABLE) and restricted to a minimum order to be
>    anti-fragmentation. The latter means that they cannot be split down
>    below that order, while they are free or in use.
> 2. ZONE_NOMERGE, which contains pages that are movable and restricted
>    to an exact order. The latter means that not only is split
>    prohibited (inherited from ZONE_NOSPLIT) but also merge (see the
>    reason in Chapter Three), while they are free or in use.
> 
> Since these two zones only can serve THP allocations (__GFP_MOVABLE |
> __GFP_COMP), they are called THP zones. Reclaim works seamlessly and
> compaction is not needed for these two zones.
> 
> Compared with the hugeTLB pool approach, THP zones tap into core MM
> features including:
> 1. THP allocations can fall back to the lower zones, which can have
>    higher latency but still succeed.
> 2. THPs can be either shattered (see Chapter Two) if partially
>    unmapped or reclaimed if becoming cold.
> 3. THP orders can be much smaller than the PMD/PUD orders, e.g., 64KB
>    contiguous PTEs on arm64 [1], which are more suitable for client
>    workloads.
> 
> Policy zones can be dynamically resized by offlining pages in one of
> them and onlining those pages in another of them. Note that this is
> only done among policy zones, not between a policy zone and a physical
> zone, since resizing is a (software) policy, not a physical
> characteristic.
> 
> Implementing the same idea in the pageblock granularity has also been
> explored but rejected at Google. Pageblocks have a finer granularity
> and therefore can be more flexible than zones. The tradeoff is that
> this alternative implementation was more complex and failed to bring a
> better ROI. However, the rejection was mainly due to its inability to
> be smoothly extended to 1GB THPs [2], which is a planned use case of
> TAO.

We did implement similar idea in the pageblock granularity on OPPO's
phones by extending two special migratetypes[1]:

* QUAD_TO_TRIP - this is mainly for 4-order mTHP allocation which can use
ARM64's CONT-PTE; but can rarely be splitted into 3 order to dull the pain
of 3-order allocation if and only if 3-order allocation has failed in both
normal buddy and the below TRIP_TO_QUAD.
  
* TRIP_TO_QUAD - this is mainly for 4-order mTHP allocation which can use
ARM64's CONT-PTE; but can sometimes be splitted into 3 order to dull the
pain of 3-order allocation if and only if 3-order allocation has failed in
normal buddy.

neither of above will be merged into 5 order or above; neither of above
will be splitted into 2 order or lower.

in compaction, we will skip both of above. I am seeing one disadvantage
of this approach is that I have to add a separate LRU list in each
zone to place those mTHP folios. if mTHP and small folios are put
in the same LRU list, the reclamation efficiency is extremely bad.

A separate zone, on the other hand, can avoid a separate LRU list
for mTHP as the new zone has its own LRU list.

[1] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplus/sm8650_u_14.0.0_oneplus12/mm/page_alloc.c

> 
> [1] https://lore.kernel.org/20240215103205.2607016-1-ryan.roberts@xxxxxxx/
> [2] https://lore.kernel.org/20200928175428.4110504-1-zi.yan@xxxxxxxx/
> 
> Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx>

Thanks
Barry





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux