On 02.11.24 11:12, Barry Song wrote:
From: Barry Song <v-songbaohua@xxxxxxxx>
When the proportion of folios from the zero map is small, missing their
accounting may not significantly impact profiling. However, it’s easy
to construct a scenario where this becomes an issue—for example,
allocating 1 GB of memory, writing zeros from userspace, followed by
MADV_PAGEOUT, and then swapping it back in. In this case, the swap-out
and swap-in counts seem to vanish into a black hole, potentially
causing semantic ambiguity.
We have two ways to address this:
1. Add a separate counter specifically for the zero map.
2. Continue using the current accounting, treating the zero map like
a normal backend. (This aligns with the current behavior of zRAM
when supporting same-page fills at the device level.)
This patch adopts option 1 as pswpin/pswpout counters are that they
only apply to IO done directly to the backend device (as noted by
Nhat Pham).
We can find these counters from /proc/vmstat (counters for the whole
system) and memcg's memory.stat (counters for the interested memcg).
For example:
$ grep -E 'swpin_zero|swpout_zero' /proc/vmstat
swpin_zero 1648
swpout_zero 33536
$ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.stat
swpin_zero 3905
swpout_zero 3985
Fixes: 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitmap")
Cc: Usama Arif <usamaarif642@xxxxxxxxx>
Cc: Chengming Zhou <chengming.zhou@xxxxxxxxx>
Cc: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
Cc: Nhat Pham <nphamcs@xxxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: David Hildenbrand <david@xxxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
Cc: Shakeel Butt <shakeel.butt@xxxxxxxxx>
Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx>
Cc: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
Cc: Chris Li <chrisl@xxxxxxxxxx>
Cc: "Huang, Ying" <ying.huang@xxxxxxxxx>
Cc: Kairui Song <kasong@xxxxxxxxxxx>
Cc: Ryan Roberts <ryan.roberts@xxxxxxx>
Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx>
---
-v2:
* add separate counters rather than using pswpin/out; thanks
for the comments from Usama, David, Yosry and Nhat;
* Usama also suggested a new counter like swapped_zero, I
prefer that one be separated as an enhancement patch not
a hotfix. will probably handle it later on.
Documentation/admin-guide/cgroup-v2.rst | 10 ++++++++++
include/linux/vm_event_item.h | 2 ++
mm/memcontrol.c | 4 ++++
mm/page_io.c | 16 ++++++++++++++++
mm/vmstat.c | 2 ++
5 files changed, 34 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index db3799f1483e..984eb3c9d05b 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1599,6 +1599,16 @@ The following nested keys are defined.
pglazyfreed (npn)
Amount of reclaimed lazyfree pages
+ swpin_zero
+ Number of pages moved into memory with zero content, meaning no
+ copy exists in the backend swapfile, allowing swap-in to avoid
+ I/O read overhead.
+
+ swpout_zero
+ Number of pages moved out of memory with zero content, meaning no
+ copy is needed in the backend swapfile, allowing swap-out to avoid
+ I/O write overhead.
Hm, can make it a bit clearer that this is a pure optimization and refer
to the other counters?
swpin_zero
Portion of "pswpin" pages for which I/O was optimized out
because the page content was detected to be zero during swapout.
swpout_zero
Portion of "pswout" pages for which I/O was optimized out
because the page content was detected to be zero.
--
Cheers,
David / dhildenb