On Fri, Nov 11, 2022 at 09:56:36AM +0900, Sergey Senozhatsky wrote: > Hi, > > On (22/11/10 14:44), Minchan Kim wrote: > > On Mon, Oct 31, 2022 at 02:40:59PM +0900, Sergey Senozhatsky wrote: > > > Hello, > > > > > > Some use-cases and/or data patterns may benefit from > > > larger zspages. Currently the limit on the number of physical > > > pages that are linked into a zspage is hardcoded to 4. Higher > > > limit changes key characteristics of a number of the size > > > classes, improving compactness of the pool and redusing the > > > amount of memory zsmalloc pool uses. More on this in 0002 > > > commit message. > > > > Hi Sergey, > > > > I think the idea that break of fixed subpages in zspage is > > really good start to optimize further. However, I am worry > > about introducing per-pool config this stage. How about > > to introduce just one golden value for the zspage size? > > order-3 or 4 in Kconfig with keeping default 2? > > Sorry, not sure I'm following. So you want a .config value > for zspage limit? I really like the sysfs knob, because then > one may set values on per-device basis (if they have multiple > zram devices in a system with different data patterns): Yes, I wanted to have just a global policy to drive zsmalloc smarter without needing user's big effort to decide right tune value(I thought the decision process would be quite painful for normal user who don't have enough resources) since zsmalloc's design makes it possible. But for the interim solution until we prove no regression, just provide config and then remove the config later when we add aggressive zpage compaction(if necessary, please see below) since it's easier to deprecate syfs knob. > > zram0 which is used as a swap device uses, say, 4 > zram1 which is vfat block device uses, say, 6 > zram2 which is ext4 block device uses, say, 8 > > The whole point of the series is that one single value does > not fit all purposes. There is no silver bullet. I understand what you want to achieve with per-pool config with exposing the knob to user but my worry is still how user could decide best fit since workload is so dynamic. Some groups have enough resouces to practice under fleet experimental while many others don't so if we really need the per-pool config step, at least, I'd like to provide default guide to user in the documentation along with the tunable knobs for experimental. Maybe, we can suggest 4 for swap case and 8 for fs case. I don't disagree the sysfs knobs for use cases but can't we deal with the issue better way? In general, the bigger pages_per_zspage, the more memory saving. It would be same with slab_order in slab allocator but slab has the limit due to high-order allocation cost and internal fragmentation with bigger order size slab. However, zsmalloc is different in that it doesn't expose memory address directly and it knows when the object is accessed by user. And it doesn't need high-order allocation, either. That's how zsmalloc could support object migration and page migration. With those features, theoretically, zsmalloc doesn't need limitation of the pages_per_zspage so I am looking forward to seeing zsmalloc handles the memory fragmentation problem better way. Only concern with bigger pages_per_zspage(e.g., 8 or 16) is exhausting memory when zram is used for swap. The use case aims to help memory pressure but the worst case, the bigger pages_per_zspage, more chance to out of memory. However, we could bound the worst case memory consumption up to for class in classes: wasted_bytes += class->pages_per_zspage * PAGE_SIZE - an object size with *aggressive zpage compaction*. Now, we are relying on shrinker (it might be already enough) to trigger but we could change the policy wasted memory in the class size crossed a threshold we defind for zram fs usecase since it would be used without memory pressure. What do you think about?