On Tue, Nov 12, 2024 at 2:11 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > > Barry Song <21cnbao@xxxxxxxxx> writes: > > > On Fri, Nov 8, 2024 at 6:23 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > >> > >> Hi, Barry, > >> > >> Barry Song <21cnbao@xxxxxxxxx> writes: > >> > >> > From: Barry Song <v-songbaohua@xxxxxxxx> > >> > > >> > When large folios are compressed at a larger granularity, we observe > >> > a notable reduction in CPU usage and a significant improvement in > >> > compression ratios. > >> > > >> > mTHP's ability to be swapped out without splitting and swapped back in > >> > as a whole allows compression and decompression at larger granularities. > >> > > >> > This patchset enhances zsmalloc and zram by adding support for dividing > >> > large folios into multi-page blocks, typically configured with a > >> > 2-order granularity. Without this patchset, a large folio is always > >> > divided into `nr_pages` 4KiB blocks. > >> > > >> > The granularity can be set using the `ZSMALLOC_MULTI_PAGES_ORDER` > >> > setting, where the default of 2 allows all anonymous THP to benefit. > >> > > >> > Examples include: > >> > * A 16KiB large folio will be compressed and stored as a single 16KiB > >> > block. > >> > * A 64KiB large folio will be compressed and stored as four 16KiB > >> > blocks. > >> > > >> > For example, swapping out and swapping in 100MiB of typical anonymous > >> > data 100 times (with 16KB mTHP enabled) using zstd yields the following > >> > results: > >> > > >> > w/o patches w/ patches > >> > swap-out time(ms) 68711 49908 > >> > swap-in time(ms) 30687 20685 > >> > compression ratio 20.49% 16.9% > >> > >> The data looks good. Thanks! > >> > >> Have you considered the situation that the large folio fails to be > >> allocated during swap-in? It's possible because the memory may be very > >> fragmented. > > > > That's correct, good question. On phones, we use a large folio pool to maintain > > a relatively high allocation success rate. When mTHP allocation fails, we have > > a workaround to allocate nr_pages of small folios and map them together to > > avoid partial reads. This ensures that the benefits of larger block compression > > and decompression are consistently maintained. That was the code running > > on production phones. > > > > We also previously experimented with maintaining multiple buffers for > > decompressed > > large blocks in zRAM, allowing upcoming do_swap_page() calls to use them when > > falling back to small folios. In this setup, the buffers achieved a > > high hit rate, though > > I don’t recall the exact number. > > > > I'm concerned that this fault-around-like fallback to nr_pages small > > folios may not > > gain traction upstream. Do you have any suggestions for improvement? > > It appears that we still haven't a solution to guarantee 100% mTHP > allocation success rate. If so, we need a fallback solution for that. > > Another possible solution is, > > 1) If failed to allocate mTHP with nr_pages, allocate nr_pages normal (4k) > folios instead > > 2) Revise the decompression interface to accept a set of folios (instead > of one folio) as target. Then, we can decompress to the normal > folios allocated in 1). > > 3) in do_swap_page(), we can either map all folios or just the fault > folios. We can put non-fault folios into swap cache if necessary. > > Does this work? this is exactly what we did in production phones: [1] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplus/sm8650_u_14.0.0_oneplus12/mm/memory.c#L4656 [2] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplus/sm8650_u_14.0.0_oneplus12/mm/memory.c#L5439 I feel that we don't need to fall back to nr_pages (though that's what we used on phones); using a dedicated 4 should be sufficient, as if zsmalloc is handling compression and decompression of 16KB. However, we are not adding them to the swapcache; instead, they are mapped immediately. > > >> > >> > -v2: > >> > While it is not mature yet, I know some people are waiting for > >> > an update :-) > >> > * Fixed some stability issues. > >> > * rebase againest the latest mm-unstable. > >> > * Set default order to 2 which benefits all anon mTHP. > >> > * multipages ZsPageMovable is not supported yet. > >> > > >> > Tangquan Zheng (2): > >> > mm: zsmalloc: support objects compressed based on multiple pages > >> > zram: support compression at the granularity of multi-pages > >> > > >> > drivers/block/zram/Kconfig | 9 + > >> > drivers/block/zram/zcomp.c | 17 +- > >> > drivers/block/zram/zcomp.h | 12 +- > >> > drivers/block/zram/zram_drv.c | 450 +++++++++++++++++++++++++++++++--- > >> > drivers/block/zram/zram_drv.h | 45 ++++ > >> > include/linux/zsmalloc.h | 10 +- > >> > mm/Kconfig | 18 ++ > >> > mm/zsmalloc.c | 232 +++++++++++++----- > >> > 8 files changed, 699 insertions(+), 94 deletions(-) > >> > > -- > Best Regards, > Huang, Ying Thanks barry