On Thu, Feb 22, 2024 at 11:54:44AM +0800, Chengming Zhou wrote: > On 2024/2/9 11:27, Yosry Ahmed wrote: > > Hey folks, > > > > This is a follow up on my previously sent RFC patch to deprecate > > z3fold [1]. This is an RFC without code, I thought I could get some > > discussion going before writing (or rather deleting) more code. I went > > back to do some analysis on the 3 zpool allocators: zbud, zsmalloc, > > and z3fold. > > This is a great analysis! Sorry for being late to see it. > > I want to vote for this direction, zram has been using zsmalloc directly, > zswap can also do this, which is simpler and we can just maintain and optimize > only one allocator. The only evident downside is dependence on MMU, right? AFAICT, yes. I saw a lot of positive responses when I sent an RFC to mark z3fold as deprecated, but there were some opposing opinions as well, which is why I did this simple analysis. I was hoping we can make forward progress with that, but was disappointed it didn't get as much attention as the deprecation RFC :) > > And I'm trying to optimize the scalability performance for zsmalloc now, > which is bad so zswap has to use 32 pools to workaround it. (zram only use > one pool, should also have the scalability problem on big server, maybe > have to use many zram block devices to workaround it too.) That's slightly orthogonal. Zsmalloc is not really showing worse performance than other allocators, so this should be a separate effort. > > But too many pools would cause more memory waste and more fragmentation, > so the resulted compression ratio is not good enough. > > As for the MMU dependence, we can actually avoid it? Maybe I missed something, > we can get object's memory vecs from zsmalloc, then send it to decompress, > which should support length(memory vecs) > 1? IIUC the dependency on MMU is due to the use of kmalloc() APIs and the fact that we may be using highmem pages. I think we may be able to work around that dependency but I didn't look closely. Hopefully Minchan or Sergey could shed more light on this. > > > > > [1]https://lore.kernel.org/linux-mm/20240112193103.3798287-1-yosryahmed@xxxxxxxxxx/ > > > > In this analysis, for each of the allocators I ran a kernel build test > > on tmpfs in a limit cgroup 5 times and captured: > > (a) The build times. > > (b) zswap_load() and zswap_store() latencies using bpftrace. > > (c) The maximum size of the zswap pool from /proc/meminfo::Zswapped. > > Here should use /proc/meminfo::Zswap, right? > Zswap is the sum of pool pages size, Zswapped is the swapped/compressed pages. Oh yes, it is /proc/meminfo::Zswap actually. I miswrote it in my email. Thanks!