On Thu, Jun 6, 2024 at 3:36 PM Minchan Kim <minchan@xxxxxxxxxx> wrote: > > On Tue, Jun 04, 2024 at 05:53:40PM +0000, Yosry Ahmed wrote: > > Zswap creates multiple zpools to improve concurrency. Each zsmalloc > > zpool creates its own 'zs_handle' and 'zspage' slab caches. Currently we > > end up with 32 slab caches of each type. > > > > Since each slab cache holds some free objects, we end up with a lot of > > free objects distributed among the separate zpool caches. Slab caches > > are designed to handle concurrent allocations by using percpu > > structures, so having a single instance of each cache should be enough, > > and avoids wasting more memory than needed due to fragmentation. > > > > Additionally, having more slab caches than needed unnecessarily slows > > down code paths that iterate slab_caches. > > > > In the results reported by Eric in [1], the amount of unused slab memory > > in these caches goes down from 242808 bytes to 29216 bytes (-88%). This > > is calculated by (num_objs - active_objs) * objsize for each 'zs_handle' > > and 'zspage' cache. Although this patch did not help with the allocation > > failure reported by Eric with zswap + zsmalloc, I think it is still > > worth merging on its own. > > > > [1]https://lore.kernel.org/lkml/20240604134458.3ae4396a@yea/ > > I doubt this is the right direction. > > Zsmalloc is used for various purposes, each with different object > lifecycles. For example, swap operations relatively involve short-lived > objects, while filesystem use cases might have longer-lived objects. > This mix of lifecycles could lead to fragmentation with this approach. Even in a swapfile, some objects can be short-lived and some objects can be long-lived, and the line between swap and file systems both becomes blurry with shmem/tmpfs. I don't think having separate caches here is vital, but I am not generally familiar with the file system use cases and I don't have data to prove/disprove it. > > I believe the original problem arose when zsmalloc reduced its lock > granularity from the class level to a global level. And then, Zswap went > to mitigate the issue with multiple zpools, but it's essentially another > bandaid on top of the existing problem, IMO. IIRC we reduced the granularity when we added writeback support to zsmalloc, which was relatively recent. I think we have seen lock contention with zsmalloc long before that. We have had a similar patch internally to use multiple zpools in zswap for many years now. +Yu Zhao Yu has more historical context about this, I am hoping he will shed more light about this. > > The correct approach would be to further reduce the zsmalloc lock > granularity. I definitely agree that the correct approach should be to fix the lock contention at the source and drop zswap's usage of multiple zpools. Nonetheless, I think this patch provides value in the meantime. The fragmentation within the slab caches is real with zswap's use case. OTOH, sharing a cache between swap and file system use cases leading to fragmentation within the same slab cache is a less severe problem in my opinion. That being said, I don't feel strongly. If you really don't like this patch I am fine with dropping it.