On Wed, Mar 20, 2024 at 2:07 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Wed, Mar 20, 2024 at 01:49:17PM -0700, Yosry Ahmed wrote: > > Hey folks, > > > > I was looking at cleaning up the same-filled handling code in zswap, > > when it hit me that after the xarray conversion, the only member of > > struct zwap_entry that is relevant to same-filled pages is now the > > objcg pointer. > > > > The xarray allows a pointer to be tagged by up to two tags (1 and 3), > > so we can completely avoid allocating a zswap_entry for same-filled > > pages by storing a tagged objcg pointer directly in the xarray > > instead. > > > > Basically the xarray would then either have a pointer to struct > > zswap_entry or struct obj_cgroup, where the latter is tagged as > > SAME_FILLED_ONE or SAME_FILLED_ZERO. > > > > There are two benefits of this: > > - Saving some memory (precisely 64 bytes per same-filled entry). > > - Further separating handling of same-filled pages from compressed > > pages, which results in some nice cleanups (especially in > > zswap_store()). It also makes further improvements easier (e.g. > > skipping limit checking for same-filled entries). > > This sounds interesting. > > Where would you store the byte value it's filled with? Or would you > limit it to zero-filled only? Oh yeah I should have explicitly stated this, it would be limited to pages filled with the same bit (all 0s or all 1s), assuming that most same-filled entries would be zero-filled anyway. > > > The disadvantage is obviously the complexity needed to handle two > > different types of pointers in the xarray, although I think with the > > correct abstractions this is not a big deal. > > > > I have some untested patches that implement this that I plan on > > testing and sending out at some point, the reason I am sending this > > RFC now is to gauge interest. I am not sure how common same-filled > > pages are. Unfortunately, this data is not easy to collect from our > > fleet (still working on it), so if someone has data from actual > > workloads that would be helpful. > > > > Running the kernel build test only shows a small amount of same-filled > > pages landing in zswap, but I am thinking maybe actual workloads have > > more zerod pages lying around. > > In our fleet, same-filled pages seem to average pretty consistently at > 10% of total stored entries. > > I'd assume they're mostly zero-filled instead of other patterns, but I > don't have any data on that. I think this would be key to know if this is doable. If most same-filled pages are zero pages, we can limit zswap's same-filled feature for those and do the optimization. Otherwise we cannot do it, as we actually store an entire word now as the same-filled "value". Alternatively, we can have a separate struct that only has objcg and value for same-filled pages. This means we still use 16 bytes (on x86_64) per same-filled entry. I think we can still get the code cleanups by separating handling for same-filled pages though? > > The savings from the entry would be a few megabytes per host, probably > not an argument by itself. But the code simplifications and shaving a > few cycles of the fast paths do sound promising.