On Tue, 12 Sep 2023 10:52:25 -0700 Stefan Roesch <shr@xxxxxxxxxxxx> wrote: > This change adds a "smart" page scanning mode for KSM. So far all the > candidate pages are continuously scanned to find candidates for > de-duplication. There are a considerably number of pages that cannot be > de-duplicated. This is costly in terms of CPU. By using smart scanning > considerable CPU savings can be achieved. > > This change takes the history of scanning pages into account and skips > the page scanning of certain pages for a while if de-deduplication for > this page has not been successful in the past. > > To do this it introduces two new fields in the ksm_rmap_item structure: > age and skip_age. age, is the KSM age and skip_page is the age for how s/skip_page/skip_age/ > long page scanning of this page is skipped. The age field is incremented > each time the page is scanned and the page cannot be de-duplicated. > > How often a page is skipped is dependent how often de-duplication has > been tried so far and the number of skips is currently limited to 8. > This value has shown to be effective with different workloads. > > The feature is currently disable by default and can be enabled with the > new smart_scan knob. > > The feature has shown to be very effective: upt to 25% of the page scans > can be eliminated; the pages_to_scan rate can be reduced by 40 - 50% and > a similar de-duplication rate can be maintained. > All seems nice. I'll sit out v1, see what people have to say. Some nits: > --- a/mm/ksm.c > +++ b/mm/ksm.c > > ... > > @@ -2305,6 +2314,45 @@ static struct ksm_rmap_item *get_next_rmap_item(struct ksm_mm_slot *mm_slot, > return rmap_item; > } > > +static unsigned int inc_skip_age(rmap_age_t age) > +{ > + if (age <= 3) > + return 1; > + if (age <= 5) > + return 2; > + if (age <= 8) > + return 4; > + > + return 8; > +} "inc_skip_age" sounds like it increments something. Can we give it a better name? And a nice comment explaining its role in life. > +static bool skip_rmap_item(struct page *page, struct ksm_rmap_item *rmap_item) > +{ > + rmap_age_t age; > + > + if (!ksm_smart_scan) > + return false; > + > + if (PageKsm(page)) > + return false; > + > + age = rmap_item->age++; > + if (age < 3) > + return false; > + > + if (rmap_item->skip_age == age) { > + rmap_item->skip_age = 0; > + return false; > + } > + > + if (rmap_item->skip_age == 0) { > + rmap_item->skip_age = age + inc_skip_age(age); > + remove_rmap_item_from_tree(rmap_item); > + } > + > + return true; > +} Would a better name be should_skip_rmap_item()? But even that name implies that the function is idempotent (has no side-effects). Again, an explanatory comment would be good. And simple comments over each non-obvious `if' statement. > > ... >