On Thu, Jun 6, 2024 at 4:03 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > On Thu, Jun 6, 2024 at 3:36 PM Minchan Kim <minchan@xxxxxxxxxx> wrote: > > > > On Tue, Jun 04, 2024 at 05:53:40PM +0000, Yosry Ahmed wrote: > > > Zswap creates multiple zpools to improve concurrency. Each zsmalloc > > > zpool creates its own 'zs_handle' and 'zspage' slab caches. Currently we > > > end up with 32 slab caches of each type. > > > > > > Since each slab cache holds some free objects, we end up with a lot of > > > free objects distributed among the separate zpool caches. Slab caches > > > are designed to handle concurrent allocations by using percpu > > > structures, so having a single instance of each cache should be enough, > > > and avoids wasting more memory than needed due to fragmentation. > > > > > > Additionally, having more slab caches than needed unnecessarily slows > > > down code paths that iterate slab_caches. > > > > > > In the results reported by Eric in [1], the amount of unused slab memory > > > in these caches goes down from 242808 bytes to 29216 bytes (-88%). This > > > is calculated by (num_objs - active_objs) * objsize for each 'zs_handle' > > > and 'zspage' cache. Although this patch did not help with the allocation > > > failure reported by Eric with zswap + zsmalloc, I think it is still > > > worth merging on its own. > > > > > > [1]https://lore.kernel.org/lkml/20240604134458.3ae4396a@yea/ > > > > I doubt this is the right direction. > > > > Zsmalloc is used for various purposes, each with different object > > lifecycles. For example, swap operations relatively involve short-lived > > objects, while filesystem use cases might have longer-lived objects. > > This mix of lifecycles could lead to fragmentation with this approach. > > Even in a swapfile, some objects can be short-lived and some objects > can be long-lived, and the line between swap and file systems both > becomes blurry with shmem/tmpfs. I don't think having separate caches > here is vital, but I am not generally familiar with the file system > use cases and I don't have data to prove/disprove it. > > > > > I believe the original problem arose when zsmalloc reduced its lock > > granularity from the class level to a global level. And then, Zswap went > > to mitigate the issue with multiple zpools, but it's essentially another > > bandaid on top of the existing problem, IMO. > > IIRC we reduced the granularity when we added writeback support to > zsmalloc, which was relatively recent. I think we have seen lock > contention with zsmalloc long before that. We have had a similar patch > internally to use multiple zpools in zswap for many years now. > > +Yu Zhao > > Yu has more historical context about this, I am hoping he will shed > more light about this. > > > > > The correct approach would be to further reduce the zsmalloc lock > > granularity. > > I definitely agree that the correct approach should be to fix the lock > contention at the source and drop zswap's usage of multiple zpools. > Nonetheless, I think this patch provides value in the meantime. The > fragmentation within the slab caches is real with zswap's use case. > OTOH, sharing a cache between swap and file system use cases leading > to fragmentation within the same slab cache is a less severe problem > in my opinion. > > That being said, I don't feel strongly. If you really don't like this > patch I am fine with dropping it. Oh and I forgot to mention, Chengming said he is already working on restoring the per-class lock and collecting lock contention data, so maybe that will be enough after all. Ideally we want to compare: - single zpool with per-pool lock - multiple zpools with per-pool lock (current) - single zpool with per-class locks