On Mon, 13 Mar 2023 14:03:33 +0100 David Hildenbrand <david@xxxxxxxxxx> wrote: > On 10.02.23 02:15, yang.yang29@xxxxxxxxxx wrote: > > From: xu xin <xu.xin16@xxxxxxxxxx> > > > > Hi, > > sorry for the late follow-up. Still wrapping my head around this and > possible alternatives. I hope we'll get some comments from others as > well about the basic approach. > > > The core idea of this patch set is to enable users to perceive the number of any > > pages merged by KSM, regardless of whether use_zero_page switch has been turned > > on, so that users can know how much free memory increase is really due to their > > madvise(MERGEABLE) actions. But the problem is, when enabling use_zero_pages, > > all empty pages will be merged with kernel zero pages instead of with each > > other as use_zero_pages is disabled, and then these zero-pages are no longer > > monitored by KSM. > > > > The motivations for me to do this contains three points: > > > > 1) MADV_UNMERGEABLE and other ways to trigger unsharing will *not* > > unshare the shared zeropage as placed by KSM (which is against the > > MADV_UNMERGEABLE documentation at least); see the link: > > https://lore.kernel.org/lkml/4a3daba6-18f9-d252-697c-197f65578c44@xxxxxxxxxx/ > > > > 2) We cannot know how many pages are zero pages placed by KSM when > > enabling use_zero_pages, which hides the critical information about > > how much actual memory are really saved by KSM. Knowing how many > > ksm-placed zero pages are helpful for user to use the policy of madvise > > (MERGEABLE) better because they can see the actual profit brought by KSM. > > > > 3) The zero pages placed-by KSM are different from those initial empty page > > (filled with zeros) which are never touched by applications. The former > > is active-merged by KSM while the later have never consume actual memory. > > > > I agree with all of the above, but it's still unclear to me if there is > a real downside to a simpler approach: > > (1) Tracking the shared zeropages. That would be fairly easy: whenever > we map/unmap a shared zeropage, we simply update the counter. > > (2) Unmerging all shared zeropages inside the VMAs during > MADV_UNMERGEABLE. > > (3) Documenting that MADV_UNMERGEABLE will also unmerge the shared > zeropage when toggle xy is flipped. > > It's certainly simpler and doesn't rely on the rmap item. See below. I would surely prefer a simpler approach > > > use_zero_pages is useful, not only because of cache colouring as described > > in doc, but also because use_zero_pages can accelerate merging empty pages > > when there are plenty of empty pages (full of zeros) as the time of > > page-by-page comparisons (unstable_tree_search_insert) is saved. So we hope to > > implement the support for ksm zero page tracking without affecting the feature > > of use_zero_pages. > > > > Zero pages may be the most common merged pages in actual environment(not only VM but > > also including other application like containers). Enabling use_zero_pages in the > > environment with plenty of empty pages(full of zeros) will be very useful. Users and > > app developer can also benefit from knowing the proportion of zero pages in all > > merged pages to optimize applications. > > > > I agree with that point, especially after I read in a paper that KSM > applied to some applications mainly deduplicates pages filled with 0s. > So it seems like a reasonable thing to optimize for. > > > With the patch series, we can both unshare zero-pages(KSM-placed) accurately > > and count ksm zero pages with enabling use_zero_pages. > > The problem with this approach I see is that it fundamentally relies on > the rmap/stable-tree to detect whether a zeropage was placed or not. > > I was wondering, why we even need an rmap item *at all* anymore. Why > can't we place the shared zeropage an call it a day (remove the rmap > item)? Once we placed a shared zeropage, the next KSM scan should better > just ignore it, it's already deduplicated. > > So if most pages we deduplicate are shared zeropages, it would be quite > interesting to reduce the memory overhead and avoid rmap items, instead > of building new functionality on top of it? > > > > If we'd really want to identify whether a zeropage was deduplciated by > KSM, we could try storing that information inside the PTE instead of this is interesting, but needs caution, for the reason you mention below > inside the RMAP. Then, we could directly adjust the counter when zapping > the shared zeropage or during MADV_DONTNEED/when unmerging. > > Eventually, we could simply say that > * !pte_dirty(): zeropage placed during fault > * pte_dirty(): zeropage placed by KSM > > Then it would also be easy to adjust counters and unmerge. We'd limit > this handling to known-working architectures initially (spec64 still has > the issue that pte_mkdirty() will set a pte writable ... and my patch to > fix that was not merged yet). We'd have to double-check all > pte_mkdirty/pte_mkclean() callsites. this will be... interesting >