On Sat, Aug 24, 2024 at 5:56 AM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote: > > Hi Barry, > > On Thu, Aug 22, 2024 at 05:13:06AM GMT, Barry Song wrote: > > On Thu, Aug 22, 2024 at 1:31 AM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote: > > > > > > On Wed, Aug 21, 2024 at 03:45:40PM GMT, hanchuanhua@xxxxxxxx wrote: > > > > From: Chuanhua Han <hanchuanhua@xxxxxxxx> > > > > > > > > > > > > 3. With both mTHP swap-out and swap-in supported, we offer the option to enable > > > > zsmalloc compression/decompression with larger granularity[2]. The upcoming > > > > optimization in zsmalloc will significantly increase swap speed and improve > > > > compression efficiency. Tested by running 100 iterations of swapping 100MiB > > > > of anon memory, the swap speed improved dramatically: > > > > time consumption of swapin(ms) time consumption of swapout(ms) > > > > lz4 4k 45274 90540 > > > > lz4 64k 22942 55667 > > > > zstdn 4k 85035 186585 > > > > zstdn 64k 46558 118533 > > > > > > Are the above number with the patch series at [2] or without? Also can > > > you explain your experiment setup or how can someone reproduce these? > > > > Hi Shakeel, > > > > The data was recorded after applying both this patch (swap-in mTHP) and > > patch [2] (compressing/decompressing mTHP instead of page). However, > > without the swap-in series, patch [2] becomes useless because: > > > > If we have a large object, such as 16 pages in zsmalloc: > > do_swap_page will happen 16 times: > > 1. decompress the whole large object and copy one page; > > 2. decompress the whole large object and copy one page; > > 3. decompress the whole large object and copy one page; > > .... > > 16. decompress the whole large object and copy one page; > > > > So, patchset [2] will actually degrade performance rather than > > enhance it if we don't have this swap-in series. This swap-in > > series is a prerequisite for the zsmalloc/zram series. > > Thanks for the explanation. > > > > > We reproduced the data through the following simple steps: > > 1. Collected anonymous pages from a running phone and saved them to a file. > > 2. Used a small program to open and read the file into a mapped anonymous > > memory. > > 3. Do the belows in the small program: > > swapout_start_time > > madv_pageout() > > swapout_end_time > > > > swapin_start_time > > read_data() > > swapin_end_time > > > > We calculate the throughput of swapout and swapin using the difference between > > end_time and start_time. Additionally, we record the memory usage of zram after > > the swapout is complete. > > > > Please correct me if I am wrong but you are saying in your experiment, > 100 MiB took 90540 ms to compress/swapout and 45274 ms to > decompress/swapin if backed by 4k pages but took 55667 ms and 22942 ms > if backed by 64k pages. Basically the table shows total time to compress > or decomress 100 MiB of memory, right? Hi Shakeel, Tangquan(CC'd) collected the data and double-checked the case to confirm the answer to your question. We have three cases: 1. no mTHP swap-in, no zsmalloc/zram multi-pages compression/decompression 2. have mTHP swap-in, no zsmalloc/zram multi-pages compression/decompression 3. have mTHP swap-in, have zsmalloc/zram multi-pages compression/decompression The data was 1 vs 3. To provide more precise data that covers each change, Tangquan tested 1 vs. 2 and 2 vs. 3 yesterday using LZ4 (the hardware might differ from the previous test, but the data shows the same trend) per my request. 1. no mTHP swapin, no zsmalloc/zram patch swapin_ms. 30336 swapout_ms. 65651 2. have mTHP swapin, no zsmalloc/zram patch swapin_ms. 27161 swapout_ms. 61135 3. have mTHP swapin, have zsmalloc/zram patch swapin_ms. 13683 swapout_ms. 43305 The test pseudocode is as follows: addr=mmap(100M) read_anon_data_from_file_to addr(); for(i=0;i<100;i++) { swapout_start_time; madv_pageout(); swapout_end_time; swapin_start_time; read_addr_to_swapin(); swapin_end_time; } So, while we saw some improvement from 1 to 2, the significant gains come from using large blocks for compression and decompression. This mTHP swap-in series ensures that mTHPs aren't lost after the first swap-in, so the following 99 iterations continue to involve THP swap-out and mTHP swap-in. The improvement from 1 to 2 is due to this mTHP swap-in series, while the improvement from 2 to 3 comes from the zsmalloc/zram patchset [2] you mentioned. [2] https://lore.kernel.org/all/20240327214816.31191-1-21cnbao@xxxxxxxxx/ > > > > > Thanks Barry