On 18.09.23 18:18, Stefan Roesch wrote:
David Hildenbrand <david@xxxxxxxxxx> writes:
On 12.09.23 19:52, Stefan Roesch wrote:
This change adds a "smart" page scanning mode for KSM. So far all the
candidate pages are continuously scanned to find candidates for
de-duplication. There are a considerably number of pages that cannot be
de-duplicated. This is costly in terms of CPU. By using smart scanning
considerable CPU savings can be achieved.
This change takes the history of scanning pages into account and skips
the page scanning of certain pages for a while if de-deduplication for
this page has not been successful in the past.
To do this it introduces two new fields in the ksm_rmap_item structure:
age and skip_age. age, is the KSM age and skip_page is the age for how
long page scanning of this page is skipped. The age field is incremented
each time the page is scanned and the page cannot be de-duplicated.
How often a page is skipped is dependent how often de-duplication has
been tried so far and the number of skips is currently limited to 8.
This value has shown to be effective with different workloads.
The feature is currently disable by default and can be enabled with the
new smart_scan knob.
The feature has shown to be very effective: upt to 25% of the page scans
can be eliminated; the pages_to_scan rate can be reduced by 40 - 50% and
a similar de-duplication rate can be maintained.
Signed-off-by: Stefan Roesch <shr@xxxxxxxxxxxx>
---
mm/ksm.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 75 insertions(+)
diff --git a/mm/ksm.c b/mm/ksm.c
index 981af9c72e7a..bfd5087c7d5a 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -56,6 +56,8 @@
#define DO_NUMA(x) do { } while (0)
#endif
+typedef u8 rmap_age_t;
+
/**
* DOC: Overview
*
@@ -193,6 +195,8 @@ struct ksm_stable_node {
* @node: rb node of this rmap_item in the unstable tree
* @head: pointer to stable_node heading this list in the stable tree
* @hlist: link into hlist of rmap_items hanging off that stable_node
+ * @age: number of scan iterations since creation
+ * @skip_age: skip rmap item until age reaches skip_age
*/
struct ksm_rmap_item {
struct ksm_rmap_item *rmap_list;
@@ -212,6 +216,8 @@ struct ksm_rmap_item {
struct hlist_node hlist;
};
};
+ rmap_age_t age;
+ rmap_age_t skip_age;
};
#define SEQNR_MASK 0x0ff /* low bits of unstable tree seqnr */
@@ -281,6 +287,9 @@ static unsigned int zero_checksum __read_mostly;
/* Whether to merge empty (zeroed) pages with actual zero pages */
static bool ksm_use_zero_pages __read_mostly;
+/* Skip pages that couldn't be de-duplicated previously */
+static bool ksm_smart_scan;
+
/* The number of zero pages which is placed by KSM */
unsigned long ksm_zero_pages;
@@ -2305,6 +2314,45 @@ static struct ksm_rmap_item
*get_next_rmap_item(struct ksm_mm_slot *mm_slot,
return rmap_item;
}
+static unsigned int inc_skip_age(rmap_age_t age)
+{
+ if (age <= 3)
+ return 1;
+ if (age <= 5)
+ return 2;
+ if (age <= 8)
+ return 4;
+
+ return 8;
+}
+
+static bool skip_rmap_item(struct page *page, struct ksm_rmap_item *rmap_item)
+{
+ rmap_age_t age;
+
+ if (!ksm_smart_scan)
+ return false;
+
+ if (PageKsm(page))
+ return false;
I'm a bit confused about this check here. scan_get_next_rmap_item() would return
a PageKsm() page and call cmp_and_merge_page().
cmp_and_merge_page() says: "first see if page can be merged into the stable
tree"
... but shouldn't a PageKsm page *already* be in the stable tree?
Maybe that's what cmp_and_merge_page() does via:
kpage = stable_tree_search(page);
if (kpage == page && rmap_item->head == stable_node) {
put_page(kpage);
return;
}
Hoping you can enlighten me :)
The above description sounds correct. During each scan we go through all
the candidate pages and this includes rmap_items that maps to KSM pages.
The above check simply skips these pages.
Can we add a comment why we don't skip them? Like
/*
* Never skip pages that are already KSM; pages cmp_and_merge_page()
* will essentially ignore them, but we still have to process them
* properly.
*/
+
+ age = rmap_item->age++;
Can't we overflow here? Is that desired, or would you want to stop at the
maximum you can store?
Yes, we can overflow here and it was a deliberate choice. If we overflow
after we tried unsuccessfully for 255 times, we re-start with shorter
skip values, but that should be fine. In return we avoid an if statement.
The age is defined as unsigned.
Can we make that explicit instead? Dealing with implicit overflows
really makes the code harder to grasp.
+ if (age < 3)
+ return false;
+
+ if (rmap_item->skip_age == age) {
+ rmap_item->skip_age = 0;
+ return false;
+ }
+
+ if (rmap_item->skip_age == 0) {
+ rmap_item->skip_age = age + inc_skip_age(age);
Can't you overflow here as well?
Yes, you can. See the above discussion. This skip_age is also an
unsigned value.
Dito.
+ remove_rmap_item_from_tree(rmap_item);
Can you enlighten me why that is required?
This is required for age calculation and BUG_ON check in
remove_rmap_item_from_tree. If we don't call remove_rmap_item_from_tree,
we will hit the BUG_ON for the skipped pages later on.
I see, thanks!
--
Cheers,
David / dhildenb