On Wed, Aug 14, 2024 at 04:48:42PM GMT, Yosry Ahmed wrote: > On Wed, Aug 14, 2024 at 4:42 PM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote: > > > > On Wed, Aug 14, 2024 at 04:03:13PM GMT, Nhat Pham wrote: > > > On Wed, Aug 14, 2024 at 9:32 AM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote: > > > > > > > > > > > > Ccing Nhat > > > > > > > > On Wed, Aug 14, 2024 at 02:57:38PM GMT, Jesper Dangaard Brouer wrote: > > > > > I suspect the next whac-a-mole will be the rstat flush for the slab code > > > > > that kswapd also activates via shrink_slab, that via > > > > > shrinker->count_objects() invoke count_shadow_nodes(). > > > > > > > > > > > > > Actually count_shadow_nodes() is already using ratelimited version. > > > > However zswap_shrinker_count() is still using the sync version. Nhat is > > > > modifying this code at the moment and we can ask if we really need most > > > > accurate values for MEMCG_ZSWAP_B and MEMCG_ZSWAPPED for the zswap > > > > writeback heuristic. > > > > > > You are referring to this, correct: > > > > > > mem_cgroup_flush_stats(memcg); > > > nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT; > > > nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED); > > > > > > It's already a bit less-than-accurate - as you pointed out in another > > > discussion, it takes into account the objects and sizes of the entire > > > subtree, rather than just the ones charged to the current (memcg, > > > node) combo. Feel free to optimize this away! > > > > > > In fact, I should probably replace this with another (atomic?) counter > > > in zswap_lruvec_state struct, which tracks the post-compression size. > > > That way, we'll have a better estimate of the compression factor - > > > total post-compression size / (length of LRU * page size), and > > > perhaps avoid the whole stat flushing path altogether... > > > > > > > That sounds like much better solution than relying on rstat for accurate > > stats. > > We can also use such atomic counters in obj_cgroup_may_zswap() and > eliminate the rstat flush there as well. Same for zswap_current_read() > probably. > > Most in-kernel flushers really only need a few stats, so I am > wondering if it's better to incrementally move these ones outside of > the rstat framework and completely eliminate in-kernel flushers. For > instance, MGLRU does not require the flush that reclaim does as > Shakeel pointed out. > > This will solve so many scalability problems that all of us have > observed at some point or another and tried to optimize. I believe > using rstat for userspace reads was the original intention anyway. I like this direction and I think zswap would be a good first target.