On Sat, Nov 2, 2024 at 3:12 AM Barry Song <21cnbao@xxxxxxxxx> wrote: > > From: Barry Song <v-songbaohua@xxxxxxxx> > > When the proportion of folios from the zero map is small, missing their > accounting may not significantly impact profiling. However, it’s easy > to construct a scenario where this becomes an issue—for example, > allocating 1 GB of memory, writing zeros from userspace, followed by > MADV_PAGEOUT, and then swapping it back in. In this case, the swap-out > and swap-in counts seem to vanish into a black hole, potentially > causing semantic ambiguity. > > We have two ways to address this: > > 1. Add a separate counter specifically for the zero map. > 2. Continue using the current accounting, treating the zero map like > a normal backend. (This aligns with the current behavior of zRAM > when supporting same-page fills at the device level.) > > This patch adopts option 1 as pswpin/pswpout counters are that they > only apply to IO done directly to the backend device (as noted by > Nhat Pham). > > We can find these counters from /proc/vmstat (counters for the whole > system) and memcg's memory.stat (counters for the interested memcg). > > For example: > > $ grep -E 'swpin_zero|swpout_zero' /proc/vmstat > swpin_zero 1648 > swpout_zero 33536 > > $ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.stat > swpin_zero 3905 > swpout_zero 3985 > LGTM FWIW, so I'll leave my review tag here: Reviewed-by: Nhat Pham <nphamcs@xxxxxxxxx> Too many emails in this thread, but my opinions is: 1. A fix tag is appropriate. It's not a kernel bug per se, but it's incredibly confusing, and can potentially throw off user space agents who rely on the rate of change of these counters as signals. 2. I do think we should use a separate set of counters for this optimization. No strong opinions regarding combining this with the zswap counters, but it can get confusing for users when they enable/disable zswap. If we are to combine, I'd be much more comfortable if we have a generic name, like the one David suggested in v1 ("swpin_skip" / "swpout_skip"). This would still require some API change tho, so not sure if this is the best approach? :) It would also be appropriate if we bring back the same-filled optimization (which should be doable in the swap ID world, but I digress).