Hey folks, This is a follow up on my previously sent RFC patch to deprecate z3fold [1]. This is an RFC without code, I thought I could get some discussion going before writing (or rather deleting) more code. I went back to do some analysis on the 3 zpool allocators: zbud, zsmalloc, and z3fold. [1]https://lore.kernel.org/linux-mm/20240112193103.3798287-1-yosryahmed@xxxxxxxxxx/ In this analysis, for each of the allocators I ran a kernel build test on tmpfs in a limit cgroup 5 times and captured: (a) The build times. (b) zswap_load() and zswap_store() latencies using bpftrace. (c) The maximum size of the zswap pool from /proc/meminfo::Zswapped. Here are the results I have. I am using zsmalloc as the base for all comparisons. -------------------------------- <Results> -------------------------------- (a) Build times *** zsmalloc *** ────────────────────────────────────────────────────────────── LABEL │ MIN │ MAX │ MEAN │ MEDIAN │ STDDEV ────────────────────┼──────────┼──────────┼──────────┼──────── real │ 108.890 │ 116.160 │ 111.304 │ 110.310 │ 2.719 sys │ 6838.860 │ 7137.830 │ 6936.414 │ 6862.160 │ 114.860 user │ 2838.270 │ 2859.050 │ 2850.116 │ 2852.590 │ 7.388 ────────────────────────────────────────────────────────────── *** zbud *** ────────────────────────────────────────────────────────────── LABEL │ MIN │ MAX │ MEAN │ MEDIAN │ STDDEV ────────────────────┼──────────┼──────────┼──────────┼──────── real │ 105.540 │ 114.430 │ 108.738 │ 108.140 │ 3.027 sys │ 6553.680 │ 6794.330 │ 6688.184 │ 6661.840 │ 86.471 user │ 2836.390 │ 2847.850 │ 2842.952 │ 2843.450 │ 3.721 ────────────────────────────────────────────────────────────── *** z3fold *** ────────────────────────────────────────────────────────────── LABEL │ MIN │ MAX │ MEAN │ MEDIAN │ STDDEV ────────────────────┼──────────┼──────────┼──────────┼──────── real │ 113.020 │ 118.110 │ 114.642 │ 114.010 │ 1.803 sys │ 7168.860 │ 7284.900 │ 7243.930 │ 7265.290 │ 42.254 user │ 2865.630 │ 2869.840 │ 2868.208 │ 2868.710 │ 1.625 ────────────────────────────────────────────────────────────── Comparing the means, zbud is 2.3% faster, and z3fold is 3% slower. (b) zswap_load() and zswap_store() latencies *** zsmalloc *** @load_ns: [128, 256) 377 | | [256, 512) 772 | | [512, 1K) 923 | | [1K, 2K) 22141 | | [2K, 4K) 88297 | | [4K, 8K) 1685833 |@@@@@ | [8K, 16K) 17087712 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [16K, 32K) 10875077 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [32K, 64K) 777656 |@@ | [64K, 128K) 127239 | | [128K, 256K) 50301 | | [256K, 512K) 1669 | | [512K, 1M) 37 | | [1M, 2M) 3 | | @store_ns: [512, 1K) 279 | | [1K, 2K) 15969 | | [2K, 4K) 193446 | | [4K, 8K) 823283 | | [8K, 16K) 14209844 |@@@@@@@@@@@ | [16K, 32K) 62040863 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [32K, 64K) 9737713 |@@@@@@@@ | [64K, 128K) 1278302 |@ | [128K, 256K) 487285 | | [256K, 512K) 4406 | | [512K, 1M) 117 | | [1M, 2M) 24 | | *** zbud *** @load_ns: [128, 256) 452 | | [256, 512) 834 | | [512, 1K) 998 | | [1K, 2K) 22708 | | [2K, 4K) 171247 | | [4K, 8K) 2853227 |@@@@@@@@ | [8K, 16K) 17727445 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [16K, 32K) 9523050 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [32K, 64K) 752423 |@@ | [64K, 128K) 135560 | | [128K, 256K) 52360 | | [256K, 512K) 4071 | | [512K, 1M) 57 | | @store_ns: [512, 1K) 518 | | [1K, 2K) 13337 | | [2K, 4K) 193043 | | [4K, 8K) 846118 | | [8K, 16K) 15240682 |@@@@@@@@@@@@@ | [16K, 32K) 60945786 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [32K, 64K) 10230719 |@@@@@@@@ | [64K, 128K) 1612647 |@ | [128K, 256K) 498344 | | [256K, 512K) 8550 | | [512K, 1M) 199 | | [1M, 2M) 1 | | *** z3fold *** @load_ns: [128, 256) 344 | | [256, 512) 999 | | [512, 1K) 859 | | [1K, 2K) 21069 | | [2K, 4K) 53704 | | [4K, 8K) 1351571 |@@@@ | [8K, 16K) 14142680 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [16K, 32K) 11788684 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [32K, 64K) 1133377 |@@@@ | [64K, 128K) 121670 | | [128K, 256K) 68663 | | [256K, 512K) 120 | | [512K, 1M) 21 | | [512, 1K) 257 | | [1K, 2K) 10162 | | [2K, 4K) 149599 | | [4K, 8K) 648121 | | [8K, 16K) 9115497 |@@@@@@@@ | [16K, 32K) 56467456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [32K, 64K) 16235236 |@@@@@@@@@@@@@@ | [64K, 128K) 1397437 |@ | [128K, 256K) 705916 | | [256K, 512K) 3087 | | [512K, 1M) 62 | | [1M, 2M) 1 | | I did not perform any sophisticated analysis on these histograms, but eyeballing them makes it clear that all allocators have somewhat similar latencies. zbud is slightly better than zsmalloc, and z3fold is slightly worse than zsmalloc. This corresponds naturally to the build times in (a). (c) Maximum size of the zswap pool *** zsmalloc *** 1,137,659,904 bytes = ~1.13G *** zbud *** 1,535,741,952 bytes = ~1.5G *** z3fold *** 1,151,303,680 bytes = ~1.15G zbud consumes ~32.7% more memory, and z3fold consumes ~1.8% more memory. This makes sense because zbud only stores a maximum of two compressed pages on each order-0 page, regardless of the compression ratio, so it is bound to consume more memory. -------------------------------- </Results> -------------------------------- According to those results, it seems like zsmalloc is superior to z3fold in both efficiency and latency. Zbud has a small latency advantage, but that comes with a huge cost in terms of memory consumption. Moreover, most known users of zswap are currently using zsmalloc. Perhaps some folks are using zbud because it was the default allocator up until recently. The only known disadvantage of zsmalloc is the dependency on MMU. Based on that, I think it doesn't make sense to keep all 3 allocators going forward. I believe we should start with removing either zbud or z3fold, leaving only one allocator supporting MMU. Once zsmalloc supports !MMU (if possible), we can keep zsmalloc as the only allocator. Thoughts and feedback are highly appreciated. I tried to CC all the interested folks, but others feel free to chime in.