Re: [PATCH v7 2/2] mm: support large folios swap-in for sync io devices

Shakeel Butt <shakeel.butt@xxxxxxxxx> · Fri, 23 Aug 2024 10:56:42 -0700

Hi Barry,

On Thu, Aug 22, 2024 at 05:13:06AM GMT, Barry Song wrote:
> On Thu, Aug 22, 2024 at 1:31 AM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:
> >
> > On Wed, Aug 21, 2024 at 03:45:40PM GMT, hanchuanhua@xxxxxxxx wrote:
> > > From: Chuanhua Han <hanchuanhua@xxxxxxxx>
> > >
> > >
> > > 3. With both mTHP swap-out and swap-in supported, we offer the option to enable
> > >    zsmalloc compression/decompression with larger granularity[2]. The upcoming
> > >    optimization in zsmalloc will significantly increase swap speed and improve
> > >    compression efficiency. Tested by running 100 iterations of swapping 100MiB
> > >    of anon memory, the swap speed improved dramatically:
> > >                 time consumption of swapin(ms)   time consumption of swapout(ms)
> > >      lz4 4k                  45274                    90540
> > >      lz4 64k                 22942                    55667
> > >      zstdn 4k                85035                    186585
> > >      zstdn 64k               46558                    118533
> >
> > Are the above number with the patch series at [2] or without? Also can
> > you explain your experiment setup or how can someone reproduce these?
> 
> Hi Shakeel,
> 
> The data was recorded after applying both this patch (swap-in mTHP) and
> patch [2] (compressing/decompressing mTHP instead of page). However,
> without the swap-in series, patch [2] becomes useless because:
> 
> If we have a large object, such as 16 pages in zsmalloc:
> do_swap_page will happen 16 times:
> 1. decompress the whole large object and copy one page;
> 2. decompress the whole large object and copy one page;
> 3. decompress the whole large object and copy one page;
> ....
> 16.  decompress the whole large object and copy one page;
> 
> So, patchset [2] will actually degrade performance rather than
> enhance it if we don't have this swap-in series. This swap-in
> series is a prerequisite for the zsmalloc/zram series.

Thanks for the explanation.

> 
> We reproduced the data through the following simple steps:
> 1. Collected anonymous pages from a running phone and saved them to a file.
> 2. Used a small program to open and read the file into a mapped anonymous
> memory.
> 3.  Do the belows in the small program:
> swapout_start_time
> madv_pageout()
> swapout_end_time
> 
> swapin_start_time
> read_data()
> swapin_end_time
> 
> We calculate the throughput of swapout and swapin using the difference between
> end_time and start_time. Additionally, we record the memory usage of zram after
> the swapout is complete.
> 

Please correct me if I am wrong but you are saying in your experiment,
100 MiB took 90540 ms to compress/swapout and 45274 ms to
decompress/swapin if backed by 4k pages but took 55667 ms and 22942 ms
if backed by 64k pages. Basically the table shows total time to compress
or decomress 100 MiB of memory, right?

> >
> > > [2] https://lore.kernel.org/all/20240327214816.31191-1-21cnbao@xxxxxxxxx/
> >
> 
> Thanks
> Barry