On Wed, Sep 25, 2024 at 11:30:34AM -0700, Yosry Ahmed wrote: > Johannes wrote: > > If this ever becomes an issue, we can handle it in a fastpath-slowpath > > scheme: check the limit up front for fast-path failure if we're > > already maxed out, just like now; then make obj_cgroup_charge_zswap() > > atomically charge against zswap.max and unwind the store if we raced. > > > > For now, I would just keep the simple version we currently have: check > > once in zswap_store() and then just go ahead for the whole folio. > > I am not totally against this but I feel like this is too optimistic. > I think we can keep it simple-ish by maintaining an ewma for the > compression ratio, we already have primitives for this (see > DECLARE_EWMA). > > Then in zswap_store(), we can use the ewma to estimate the compressed > size and use it to do the memcg and global limit checks once, like we > do today. Instead of just checking if we are below the limits, we > check if we have enough headroom for the estimated compressed size. > Then we call zswap_store_page() to do the per-page stuff, then do > batched charging and stats updates. I'm not sure what you gain from making a non-atomic check precise. You can get a hundred threads determining down precisely that *their* store will fit exactly into the last 800kB before the limit. > If you think that's an overkill we can keep doing the limit checks as > we do today, I just don't see how it would make a practical difference. What would make a difference is atomic transactional charging of the compressed size, and unwinding on failure - with the upfront check to avoid pointlessly compressing (outside of race conditions). And I'm not against doing that in general, I am just against doing it per default. It's a lot of complexity, and like I said, the practical usecase for limiting zswap memory to begin with is quite unclear to me. Zswap is not a limited resource. It's just memory. And you already had the memory for the uncompressed copy. So it's a bit strange to me to say "you have compressed your memory enough, so now you get sent to disk (or we declare OOM)". What would be a reason to limit it? It sort of makes sense as a binary switch, but I don't get the usecase for a granular limit. (And I blame my own cowardice for making the cgroup knob a limit, to keep options open, instead of a switch.) All that to say, this would be better in a follow-up patch. We allow overshooting now, it's not clear how overshooting by a larger amount makes a categorical difference. > but I would still like to see batching of all the limit checks, > charging, and stats updates. It makes little sense otherwise. Definitely. One check, one charge, one stat update per folio.