Barry Song <21cnbao@xxxxxxxxx> writes: > On Fri, Nov 8, 2024 at 6:23 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote: >> >> Hi, Barry, >> >> Barry Song <21cnbao@xxxxxxxxx> writes: >> >> > From: Barry Song <v-songbaohua@xxxxxxxx> >> > >> > When large folios are compressed at a larger granularity, we observe >> > a notable reduction in CPU usage and a significant improvement in >> > compression ratios. >> > >> > mTHP's ability to be swapped out without splitting and swapped back in >> > as a whole allows compression and decompression at larger granularities. >> > >> > This patchset enhances zsmalloc and zram by adding support for dividing >> > large folios into multi-page blocks, typically configured with a >> > 2-order granularity. Without this patchset, a large folio is always >> > divided into `nr_pages` 4KiB blocks. >> > >> > The granularity can be set using the `ZSMALLOC_MULTI_PAGES_ORDER` >> > setting, where the default of 2 allows all anonymous THP to benefit. >> > >> > Examples include: >> > * A 16KiB large folio will be compressed and stored as a single 16KiB >> > block. >> > * A 64KiB large folio will be compressed and stored as four 16KiB >> > blocks. >> > >> > For example, swapping out and swapping in 100MiB of typical anonymous >> > data 100 times (with 16KB mTHP enabled) using zstd yields the following >> > results: >> > >> > w/o patches w/ patches >> > swap-out time(ms) 68711 49908 >> > swap-in time(ms) 30687 20685 >> > compression ratio 20.49% 16.9% >> >> The data looks good. Thanks! >> >> Have you considered the situation that the large folio fails to be >> allocated during swap-in? It's possible because the memory may be very >> fragmented. > > That's correct, good question. On phones, we use a large folio pool to maintain > a relatively high allocation success rate. When mTHP allocation fails, we have > a workaround to allocate nr_pages of small folios and map them together to > avoid partial reads. This ensures that the benefits of larger block compression > and decompression are consistently maintained. That was the code running > on production phones. > > We also previously experimented with maintaining multiple buffers for > decompressed > large blocks in zRAM, allowing upcoming do_swap_page() calls to use them when > falling back to small folios. In this setup, the buffers achieved a > high hit rate, though > I don’t recall the exact number. > > I'm concerned that this fault-around-like fallback to nr_pages small > folios may not > gain traction upstream. Do you have any suggestions for improvement? It appears that we still haven't a solution to guarantee 100% mTHP allocation success rate. If so, we need a fallback solution for that. Another possible solution is, 1) If failed to allocate mTHP with nr_pages, allocate nr_pages normal (4k) folios instead 2) Revise the decompression interface to accept a set of folios (instead of one folio) as target. Then, we can decompress to the normal folios allocated in 1). 3) in do_swap_page(), we can either map all folios or just the fault folios. We can put non-fault folios into swap cache if necessary. Does this work? >> >> > -v2: >> > While it is not mature yet, I know some people are waiting for >> > an update :-) >> > * Fixed some stability issues. >> > * rebase againest the latest mm-unstable. >> > * Set default order to 2 which benefits all anon mTHP. >> > * multipages ZsPageMovable is not supported yet. >> > >> > Tangquan Zheng (2): >> > mm: zsmalloc: support objects compressed based on multiple pages >> > zram: support compression at the granularity of multi-pages >> > >> > drivers/block/zram/Kconfig | 9 + >> > drivers/block/zram/zcomp.c | 17 +- >> > drivers/block/zram/zcomp.h | 12 +- >> > drivers/block/zram/zram_drv.c | 450 +++++++++++++++++++++++++++++++--- >> > drivers/block/zram/zram_drv.h | 45 ++++ >> > include/linux/zsmalloc.h | 10 +- >> > mm/Kconfig | 18 ++ >> > mm/zsmalloc.c | 232 +++++++++++++----- >> > 8 files changed, 699 insertions(+), 94 deletions(-) >> -- Best Regards, Huang, Ying