David Hildenbrand <david@xxxxxxxxxx> writes: > On 26.09.23 06:09, Stefan Roesch wrote: >> This change adds a "smart" page scanning mode for KSM. So far all the >> candidate pages are continuously scanned to find candidates for >> de-duplication. There are a considerably number of pages that cannot be >> de-duplicated. This is costly in terms of CPU. By using smart scanning >> considerable CPU savings can be achieved. >> This change takes the history of scanning pages into account and skips >> the page scanning of certain pages for a while if de-deduplication for >> this page has not been successful in the past. >> To do this it introduces two new fields in the ksm_rmap_item structure: >> age and remaining_skips. age, is the KSM age and remaining_skips >> determines how often scanning of this page is skipped. The age field is >> incremented each time the page is scanned and the page cannot be de- >> duplicated. age updated is capped at U8_MAX. >> How often a page is skipped is dependent how often de-duplication has >> been tried so far and the number of skips is currently limited to 8. >> This value has shown to be effective with different workloads. >> The feature is currently disable by default and can be enabled with the >> new smart_scan knob. >> The feature has shown to be very effective: upt to 25% of the page scans >> can be eliminated; the pages_to_scan rate can be reduced by 40 - 50% and >> a similar de-duplication rate can be maintained. > > Thinking about it, what are the cons of just enabling this always and not > exposing new toggles? Alternatively, we could make this a compile-time option. > > In general, LGTM, just curious if we really have to make this configurable. > The only downside I can see is that it might take a longer time for some pages to be de-duplicated (a new candidate page is added, but its duplicate is skipped in this round). So it will take longer to de-duplicate this page. I tested with more than one workload, but it might be useful to get some data with additional workloads. I was thinking of enabling it after one or two releases.