> -----Original Message----- > From: Nhat Pham <nphamcs@xxxxxxxxx> > Sent: Tuesday, September 24, 2024 4:11 PM > To: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > Cc: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>; linux- > kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; hannes@xxxxxxxxxxx; > chengming.zhou@xxxxxxxxx; usamaarif642@xxxxxxxxx; > shakeel.butt@xxxxxxxxx; ryan.roberts@xxxxxxx; Huang, Ying > <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; > Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>; > joshua.hahnjy@xxxxxxxxx > Subject: Re: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in > zswap_store(). > > On Tue, Sep 24, 2024 at 2:38 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> > wrote: > > > > > > We can also do what we discussed before about double charging. The > > pages that are being reclaimed are already charged, so technically we > > don't need to charge them again. We can uncharge the difference > > between compressed and uncompressed sizes after compression and call > > it a day. This fixes the limit checking and the double charging in one > > go. > > I am a little bit nervous though about zswap uncharing the pages from > > under reclaim, there are likely further accesses of the page memcg > > after zswap. Maybe we can plumb the info back to reclaim or set a flag > > on the page to avoid uncharging it when it's freed. > > Hmm this is just for memory usage charging, no? The problem here is > the zswap usage (zswap.current), and its relation to the limit. > > One thing we can do is check the zswap usage against the limit for > every subpage, but that's likely expensive...? This is the approach currently implemented in v7. Data gathered doesn’t indicate a performance issue with this specific workload in the two scenarios validated, namely, zswap-4K vs. zswap-mTHP and SSD-mTHP vs. zswap-mTHP (we only see performance gains with explainable sys time increase). Of course, the existing implementation could be a baseline for validating performance of other approaches, e.g., checking zswap usage per mTHP. However, these other approaches would also need to be evaluated for more global multi-instance implications as far as all processes being able to make progress. > > With the new atomic counters Joshua is working on, we can > check-and-charge at the same time, after we have compressed the whole > large folio, like this: > > for (memcg = original_memcg; !mem_cgroup_is_root(memcg); > memcg = parent_mem_cgroup(memcg)); > old_usage = atomic_read(&memcg->zswap); > > do { > new_usage = old_usage + size; > if (new_usage > limit) { > /* undo charging of descendants, then return false */ > } > } while (!atomic_try_cmpxchg(&memcg->zswap, old_usage, new_usage)) > } > > But I don't know what we can do in the current design. I gave it some > more thought, and even if we only check after we know the size, we can > still potentially overshoot the limit :( I agree. Moreover, these checks based on estimated ratio or compressed size could also add overhead in the normal case where we are not near the usage limits.