David Hildenbrand <david@xxxxxxxxxx> writes: > On 22.08.23 20:05, Stefan Roesch wrote: >> With madvise and prctl KSM can be enabled for different VMA's. Once it >> is enabled we can query how effective KSM is overall. However we cannot >> easily query if an individual VMA benefits from KSM. >> This commit adds a KSM section to the /prod/<pid>/smaps file. It reports >> how many of the pages are KSM pages. The returned value for KSM is >> independent of the use of the shared zeropage. > > Maybe phrase that to something like "The returned value for KSM includes > KSM-placed zeropages, so we can observe the actual KSM benefit independent > of the usage of the shared zeropage for KSM.". > > But thinking about it (see below), maybe we really just let any user figure that out by > temporarily disabling the shared zeropage. > > So this would be > > "It reports how many of the pages are KSM pages. Note that KSM-placed zeropages > are not included, only actual KSM pages." > I'll replace the commit message and the documentation with the above sentence. >> Here is a typical output: >> 7f420a000000-7f421a000000 rw-p 00000000 00:00 0 >> Size: 262144 kB >> KernelPageSize: 4 kB >> MMUPageSize: 4 kB >> Rss: 51212 kB >> Pss: 8276 kB >> Shared_Clean: 172 kB >> Shared_Dirty: 42996 kB >> Private_Clean: 196 kB >> Private_Dirty: 7848 kB >> Referenced: 15388 kB >> Anonymous: 51212 kB >> KSM: 41376 kB >> LazyFree: 0 kB >> AnonHugePages: 0 kB >> ShmemPmdMapped: 0 kB >> FilePmdMapped: 0 kB >> Shared_Hugetlb: 0 kB >> Private_Hugetlb: 0 kB >> Swap: 202016 kB >> SwapPss: 3882 kB >> Locked: 0 kB >> THPeligible: 0 >> ProtectionKey: 0 >> ksm_state: 0 >> ksm_skip_base: 0 >> ksm_skip_count: 0 >> VmFlags: rd wr mr mw me nr mg anon >> This information also helps with the following workflow: >> - First enable KSM for all the VMA's of a process with prctl. >> - Then analyze with the above smaps report which VMA's benefit the most >> - Change the application (if possible) to add the corresponding madvise >> calls for the VMA's that benefit the most >> Signed-off-by: Stefan Roesch <shr@xxxxxxxxxxxx> >> --- >> Documentation/filesystems/proc.rst | 4 ++++ >> fs/proc/task_mmu.c | 16 +++++++++++----- >> 2 files changed, 15 insertions(+), 5 deletions(-) >> diff --git a/Documentation/filesystems/proc.rst >> b/Documentation/filesystems/proc.rst >> index 7897a7dafcbc..d5bdfd59f5b0 100644 >> --- a/Documentation/filesystems/proc.rst >> +++ b/Documentation/filesystems/proc.rst >> @@ -461,6 +461,7 @@ Memory Area, or VMA) there is a series of lines such as the following:: >> Private_Dirty: 0 kB >> Referenced: 892 kB >> Anonymous: 0 kB >> + KSM: 0 kB >> LazyFree: 0 kB >> AnonHugePages: 0 kB >> ShmemPmdMapped: 0 kB >> @@ -501,6 +502,9 @@ accessed. >> a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE >> and a page is modified, the file page is replaced by a private anonymous copy. >> +"KSM" shows the amount of anonymous memory that has been de-duplicated. The >> +value is independent of the use of shared zeropage. > > Maybe here as well. > >> + >> "LazyFree" shows the amount of memory which is marked by madvise(MADV_FREE). >> The memory isn't freed immediately with madvise(). It's freed in memory >> pressure if the memory is clean. Please note that the printed value might >> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c >> index 51315133cdc2..4532caa8011c 100644 >> --- a/fs/proc/task_mmu.c >> +++ b/fs/proc/task_mmu.c >> @@ -4,6 +4,7 @@ >> #include <linux/hugetlb.h> >> #include <linux/huge_mm.h> >> #include <linux/mount.h> >> +#include <linux/ksm.h> >> #include <linux/seq_file.h> >> #include <linux/highmem.h> >> #include <linux/ptrace.h> >> @@ -396,6 +397,7 @@ struct mem_size_stats { >> unsigned long swap; >> unsigned long shared_hugetlb; >> unsigned long private_hugetlb; >> + unsigned long ksm; >> u64 pss; >> u64 pss_anon; >> u64 pss_file; >> @@ -435,9 +437,9 @@ static void smaps_page_accumulate(struct mem_size_stats *mss, >> } >> } >> -static void smaps_account(struct mem_size_stats *mss, struct page *page, >> - bool compound, bool young, bool dirty, bool locked, >> - bool migration) >> +static void smaps_account(struct mem_size_stats *mss, pte_t *pte, >> + struct page *page, bool compound, bool young, bool dirty, >> + bool locked, bool migration) >> { >> int i, nr = compound ? compound_nr(page) : 1; >> unsigned long size = nr * PAGE_SIZE; >> @@ -452,6 +454,9 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page, >> mss->lazyfree += size; >> } >> + if (PageKsm(page) && (!pte || !is_ksm_zero_pte(*pte))) > > I think this won't work either way, because smaps_pte_entry() never ends up calling > this function with !page. And the shared zeropage here always gives us !page. > > What would work is: > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 15ddf4653a19..ef6f39d7c5a2 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -528,6 +528,9 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, > page = vm_normal_page(vma, addr, ptent); > young = pte_young(ptent); > dirty = pte_dirty(ptent); > + > + if (!page && is_ksm_zero_pte(ptent)) > + mss->ksm += size; > } else if (is_swap_pte(ptent)) { > swp_entry_t swpent = pte_to_swp_entry(ptent); > That means that "KSM" can be bigger than "Anonymous" and "RSS" when the shared > zeropage is used. > > Interestingly, right now we account each KSM page individually towards > "Anonymous" and "RSS". > > So if we have 100 times the same KSM page in a VMA, we will have 100 times anon > and 100 times rss. > > Thinking about it, I guess considering the KSM-placed zeropage indeed adds more > confusion to that. Eventually, we might just want separate "Shared-zeropages" count. > > > So maybe v3 is better, clarifying the documentation a bit, that the > KSM-placed zeropage is not considered. > o > Sorry for changing my mind :D Thoughts? I agree I think v3 is better, I'll revert to v3 and add the above documentation change.