On Wed, Aug 14, 2024 at 4:42 PM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote: > > On Wed, Aug 14, 2024 at 04:03:13PM GMT, Nhat Pham wrote: > > On Wed, Aug 14, 2024 at 9:32 AM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote: > > > > > > > > > Ccing Nhat > > > > > > On Wed, Aug 14, 2024 at 02:57:38PM GMT, Jesper Dangaard Brouer wrote: > > > > I suspect the next whac-a-mole will be the rstat flush for the slab code > > > > that kswapd also activates via shrink_slab, that via > > > > shrinker->count_objects() invoke count_shadow_nodes(). > > > > > > > > > > Actually count_shadow_nodes() is already using ratelimited version. > > > However zswap_shrinker_count() is still using the sync version. Nhat is > > > modifying this code at the moment and we can ask if we really need most > > > accurate values for MEMCG_ZSWAP_B and MEMCG_ZSWAPPED for the zswap > > > writeback heuristic. > > > > You are referring to this, correct: > > > > mem_cgroup_flush_stats(memcg); > > nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT; > > nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED); > > > > It's already a bit less-than-accurate - as you pointed out in another > > discussion, it takes into account the objects and sizes of the entire > > subtree, rather than just the ones charged to the current (memcg, > > node) combo. Feel free to optimize this away! > > > > In fact, I should probably replace this with another (atomic?) counter > > in zswap_lruvec_state struct, which tracks the post-compression size. > > That way, we'll have a better estimate of the compression factor - > > total post-compression size / (length of LRU * page size), and > > perhaps avoid the whole stat flushing path altogether... > > > > That sounds like much better solution than relying on rstat for accurate > stats. We can also use such atomic counters in obj_cgroup_may_zswap() and eliminate the rstat flush there as well. Same for zswap_current_read() probably. Most in-kernel flushers really only need a few stats, so I am wondering if it's better to incrementally move these ones outside of the rstat framework and completely eliminate in-kernel flushers. For instance, MGLRU does not require the flush that reclaim does as Shakeel pointed out. This will solve so many scalability problems that all of us have observed at some point or another and tried to optimize. I believe using rstat for userspace reads was the original intention anyway.