Barry Song <21cnbao@xxxxxxxxx> writes: > On Tue, Nov 12, 2024 at 2:11 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote: >> >> Barry Song <21cnbao@xxxxxxxxx> writes: >> >> > On Fri, Nov 8, 2024 at 6:23 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote: >> >> >> >> Hi, Barry, >> >> >> >> Barry Song <21cnbao@xxxxxxxxx> writes: >> >> >> >> > From: Barry Song <v-songbaohua@xxxxxxxx> >> >> > >> >> > When large folios are compressed at a larger granularity, we observe >> >> > a notable reduction in CPU usage and a significant improvement in >> >> > compression ratios. >> >> > >> >> > mTHP's ability to be swapped out without splitting and swapped back in >> >> > as a whole allows compression and decompression at larger granularities. >> >> > >> >> > This patchset enhances zsmalloc and zram by adding support for dividing >> >> > large folios into multi-page blocks, typically configured with a >> >> > 2-order granularity. Without this patchset, a large folio is always >> >> > divided into `nr_pages` 4KiB blocks. >> >> > >> >> > The granularity can be set using the `ZSMALLOC_MULTI_PAGES_ORDER` >> >> > setting, where the default of 2 allows all anonymous THP to benefit. >> >> > >> >> > Examples include: >> >> > * A 16KiB large folio will be compressed and stored as a single 16KiB >> >> > block. >> >> > * A 64KiB large folio will be compressed and stored as four 16KiB >> >> > blocks. >> >> > >> >> > For example, swapping out and swapping in 100MiB of typical anonymous >> >> > data 100 times (with 16KB mTHP enabled) using zstd yields the following >> >> > results: >> >> > >> >> > w/o patches w/ patches >> >> > swap-out time(ms) 68711 49908 >> >> > swap-in time(ms) 30687 20685 >> >> > compression ratio 20.49% 16.9% >> >> >> >> The data looks good. Thanks! >> >> >> >> Have you considered the situation that the large folio fails to be >> >> allocated during swap-in? It's possible because the memory may be very >> >> fragmented. >> > >> > That's correct, good question. On phones, we use a large folio pool to maintain >> > a relatively high allocation success rate. When mTHP allocation fails, we have >> > a workaround to allocate nr_pages of small folios and map them together to >> > avoid partial reads. This ensures that the benefits of larger block compression >> > and decompression are consistently maintained. That was the code running >> > on production phones. >> > >> > We also previously experimented with maintaining multiple buffers for >> > decompressed >> > large blocks in zRAM, allowing upcoming do_swap_page() calls to use them when >> > falling back to small folios. In this setup, the buffers achieved a >> > high hit rate, though >> > I don’t recall the exact number. >> > >> > I'm concerned that this fault-around-like fallback to nr_pages small >> > folios may not >> > gain traction upstream. Do you have any suggestions for improvement? >> >> It appears that we still haven't a solution to guarantee 100% mTHP >> allocation success rate. If so, we need a fallback solution for that. >> >> Another possible solution is, >> >> 1) If failed to allocate mTHP with nr_pages, allocate nr_pages normal (4k) >> folios instead >> >> 2) Revise the decompression interface to accept a set of folios (instead >> of one folio) as target. Then, we can decompress to the normal >> folios allocated in 1). >> >> 3) in do_swap_page(), we can either map all folios or just the fault >> folios. We can put non-fault folios into swap cache if necessary. >> >> Does this work? > > this is exactly what we did in production phones: I think that this is upstreamable. > [1] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplus/sm8650_u_14.0.0_oneplus12/mm/memory.c#L4656 > [2] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplus/sm8650_u_14.0.0_oneplus12/mm/memory.c#L5439 > > I feel that we don't need to fall back to nr_pages (though that's what > we used on phones); > using a dedicated 4 should be sufficient, as if zsmalloc is handling > compression and > decompression of 16KB. Yes. We only need the number of normal folios to make decompress work. > However, we are not adding them to the > swapcache; instead, > they are mapped immediately. I think that works. >> >> >> >> >> > -v2: >> >> > While it is not mature yet, I know some people are waiting for >> >> > an update :-) >> >> > * Fixed some stability issues. >> >> > * rebase againest the latest mm-unstable. >> >> > * Set default order to 2 which benefits all anon mTHP. >> >> > * multipages ZsPageMovable is not supported yet. >> >> > >> >> > Tangquan Zheng (2): >> >> > mm: zsmalloc: support objects compressed based on multiple pages >> >> > zram: support compression at the granularity of multi-pages >> >> > >> >> > drivers/block/zram/Kconfig | 9 + >> >> > drivers/block/zram/zcomp.c | 17 +- >> >> > drivers/block/zram/zcomp.h | 12 +- >> >> > drivers/block/zram/zram_drv.c | 450 +++++++++++++++++++++++++++++++--- >> >> > drivers/block/zram/zram_drv.h | 45 ++++ >> >> > include/linux/zsmalloc.h | 10 +- >> >> > mm/Kconfig | 18 ++ >> >> > mm/zsmalloc.c | 232 +++++++++++++----- >> >> > 8 files changed, 699 insertions(+), 94 deletions(-) >> >> -- Best Regards, Huang, Ying