On Tue, Nov 12, 2024 at 5:43 AM Usama Arif <usamaarif642@xxxxxxxxx> wrote: > > > > On 08/11/2024 06:51, Barry Song wrote: > > On Fri, Nov 8, 2024 at 6:23 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > >> > >> Hi, Barry, > >> > >> Barry Song <21cnbao@xxxxxxxxx> writes: > >> > >>> From: Barry Song <v-songbaohua@xxxxxxxx> > >>> > >>> When large folios are compressed at a larger granularity, we observe > >>> a notable reduction in CPU usage and a significant improvement in > >>> compression ratios. > >>> > >>> mTHP's ability to be swapped out without splitting and swapped back in > >>> as a whole allows compression and decompression at larger granularities. > >>> > >>> This patchset enhances zsmalloc and zram by adding support for dividing > >>> large folios into multi-page blocks, typically configured with a > >>> 2-order granularity. Without this patchset, a large folio is always > >>> divided into `nr_pages` 4KiB blocks. > >>> > >>> The granularity can be set using the `ZSMALLOC_MULTI_PAGES_ORDER` > >>> setting, where the default of 2 allows all anonymous THP to benefit. > >>> > >>> Examples include: > >>> * A 16KiB large folio will be compressed and stored as a single 16KiB > >>> block. > >>> * A 64KiB large folio will be compressed and stored as four 16KiB > >>> blocks. > >>> > >>> For example, swapping out and swapping in 100MiB of typical anonymous > >>> data 100 times (with 16KB mTHP enabled) using zstd yields the following > >>> results: > >>> > >>> w/o patches w/ patches > >>> swap-out time(ms) 68711 49908 > >>> swap-in time(ms) 30687 20685 > >>> compression ratio 20.49% 16.9% > >> > >> The data looks good. Thanks! > >> > >> Have you considered the situation that the large folio fails to be > >> allocated during swap-in? It's possible because the memory may be very > >> fragmented. > > > > That's correct, good question. On phones, we use a large folio pool to maintain > > a relatively high allocation success rate. When mTHP allocation fails, we have > > a workaround to allocate nr_pages of small folios and map them together to > > avoid partial reads. This ensures that the benefits of larger block compression > > and decompression are consistently maintained. That was the code running > > on production phones. > > > > Thanks for sending the v2! > > How is the large folio pool maintained. I dont think there is something in upstream In production phones, we have extended the migration type for mTHP separately during Linux boot[1]. [1] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplus/sm8650_u_14.0.0_oneplus12/mm/page_alloc.c#L2089 These pageblocks have their own migration type, resulting in a separate buddy free list. We prevent order-0 allocations from drawing memory from this pool, ensuring a relatively high success rate for mTHP allocations. In one instance, phones reported an mTHP allocation success rate of less than 5% after running for a few hours without this kind of reservation mechanism. Therefore, we need an upstream solution in the kernel to ensure sustainable mTHP support across all scenarios. > kernel for this? The only thing that I saw on the mailing list is TAO for pmd-mappable > THPs only? I think that was about 7-8 months ago and wasn't merged? TAO supports mTHP as long as it can be configured through the bootcmd: nomerge=25%,4 This means we are providing a 4-order mTHP pool with 25% of total memory reserved. Note that the Android common kernel has already integrated TAO[2][3], so we are trying to use TAO to replace our previous approach of extending the migration type. [2] https://android.googlesource.com/kernel/common/+/c1ff6dcf209e4abc23584d2cd117f725421bccac [3] https://android.googlesource.com/kernel/common/+/066872d13d0c0b076785f0b794b650de0941c1c9 > The workaround to allocate nr_pages of small folios and map them > together to avoid partial reads is also not upstream, right? Correct. It's running on the phones[4][5], but I still don't know how to handle it upstream properly. [4] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplus/sm8650_u_14.0.0_oneplus12/mm/memory.c#L4656 [5] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplus/sm8650_u_14.0.0_oneplus12/mm/memory.c#L5439 > > Do you have any data how this would perform with the upstream kernel, i.e. without > a large folio pool and the workaround and if large granularity compression is worth having > without those patches? I’d say large granularity compression isn’t a problem, but large granularity decompression could be. The worst case would be if we swap out a large block, such as 16KB, but end up swapping in 4 times due to allocation failures, falling back to smaller folios. In this scenario, we would need to perform three redundant decompressions. I will work with Tangquan to provide this data this week. But once we swap in small folios, they remain small (we can't collapse them for mTHP). As a result, the next time, they will be swapped out and swapped in as small folios. Therefore, this potential loss is one-time. > > Thanks, > Usama > > > We also previously experimented with maintaining multiple buffers for > > decompressed > > large blocks in zRAM, allowing upcoming do_swap_page() calls to use them when > > falling back to small folios. In this setup, the buffers achieved a > > high hit rate, though > > I don’t recall the exact number. > > > > I'm concerned that this fault-around-like fallback to nr_pages small > > folios may not > > gain traction upstream. Do you have any suggestions for improvement? > > > >> > >>> -v2: > >>> While it is not mature yet, I know some people are waiting for > >>> an update :-) > >>> * Fixed some stability issues. > >>> * rebase againest the latest mm-unstable. > >>> * Set default order to 2 which benefits all anon mTHP. > >>> * multipages ZsPageMovable is not supported yet. > >>> > >>> Tangquan Zheng (2): > >>> mm: zsmalloc: support objects compressed based on multiple pages > >>> zram: support compression at the granularity of multi-pages > >>> > >>> drivers/block/zram/Kconfig | 9 + > >>> drivers/block/zram/zcomp.c | 17 +- > >>> drivers/block/zram/zcomp.h | 12 +- > >>> drivers/block/zram/zram_drv.c | 450 +++++++++++++++++++++++++++++++--- > >>> drivers/block/zram/zram_drv.h | 45 ++++ > >>> include/linux/zsmalloc.h | 10 +- > >>> mm/Kconfig | 18 ++ > >>> mm/zsmalloc.c | 232 +++++++++++++----- > >>> 8 files changed, 699 insertions(+), 94 deletions(-) > >> > >> -- > >> Best Regards, > >> Huang, Ying > > Thanks barry