On Mon, Dec 18, 2023, at 3:29 AM, David Hildenbrand wrote: > On 13.12.23 19:27, Stefan Roesch wrote: >> This adds the ksm advisor. The ksm advisor automatically manages the >> pages_to_scan setting to achieve a target scan time. The target scan >> time defines how many seconds it should take to scan all the candidate >> KSM pages. In other words the pages_to_scan rate is changed by the >> advisor to achieve the target scan time. The algorithm has a max and min >> value to: >> - guarantee responsiveness to changes >> - limit CPU resource consumption >> >> The respective parameters are: >> - ksm_advisor_target_scan_time (how many seconds a scan should take) >> - ksm_advisor_max_cpu (maximum value for cpu percent usage) >> >> - ksm_advisor_min_pages (minimum value for pages_to_scan per batch) >> - ksm_advisor_max_pages (maximum value for pages_to_scan per batch) >> >> The algorithm calculates the change value based on the target scan time >> and the previous scan time. To avoid pertubations an exponentially >> weighted moving average is applied. >> >> The advisor is managed by two main parameters: target scan time, >> cpu max time for the ksmd background thread. These parameters determine >> how aggresive ksmd scans. >> >> In addition there are min and max values for the pages_to_scan parameter >> to make sure that its initial and max values are not set too low or too >> high. This ensures that it is able to react to changes quickly enough. >> >> The default values are: >> - target scan time: 200 secs >> - max cpu: 70% >> - min pages: 500 >> - max pages: 30000 >> >> By default the advisor is disabled. Currently there are two advisors: >> none and scan-time. >> >> Tests with various workloads have shown considerable CPU savings. Most >> of the workloads I have investigated have more candidate pages during >> startup, once the workload is stable in terms of memory, the number of >> candidate pages is reduced. Without the advisor, the pages_to_scan needs >> to be sized for the maximum number of candidate pages. So having this >> advisor definitely helps in reducing CPU consumption. >> >> For the instagram workload, the advisor achieves a 25% CPU reduction. >> Once the memory is stable, the pages_to_scan parameter gets reduced to >> about 40% of its max value. >> >> Signed-off-by: Stefan Roesch <shr@xxxxxxxxxxxx> >> --- >> mm/ksm.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 160 insertions(+), 1 deletion(-) >> >> diff --git a/mm/ksm.c b/mm/ksm.c >> index 7efcc68ccc6ea..4f7b71a1f3112 100644 >> --- a/mm/ksm.c >> +++ b/mm/ksm.c >> @@ -21,6 +21,7 @@ >> #include <linux/sched.h> >> #include <linux/sched/mm.h> >> #include <linux/sched/coredump.h> >> +#include <linux/sched/cputime.h> >> #include <linux/rwsem.h> >> #include <linux/pagemap.h> >> #include <linux/rmap.h> >> @@ -248,6 +249,9 @@ static struct kmem_cache *rmap_item_cache; >> static struct kmem_cache *stable_node_cache; >> static struct kmem_cache *mm_slot_cache; >> >> +/* Default number of pages to scan per batch */ >> +#define DEFAULT_PAGES_TO_SCAN 100 >> + >> /* The number of pages scanned */ >> static unsigned long ksm_pages_scanned; >> >> @@ -276,7 +280,7 @@ static unsigned int ksm_stable_node_chains_prune_millisecs = 2000; >> static int ksm_max_page_sharing = 256; >> >> /* Number of pages ksmd should scan in one batch */ >> -static unsigned int ksm_thread_pages_to_scan = 100; >> +static unsigned int ksm_thread_pages_to_scan = DEFAULT_PAGES_TO_SCAN; >> >> /* Milliseconds ksmd should sleep between batches */ >> static unsigned int ksm_thread_sleep_millisecs = 20; >> @@ -297,6 +301,155 @@ unsigned long ksm_zero_pages; >> /* The number of pages that have been skipped due to "smart scanning" */ >> static unsigned long ksm_pages_skipped; >> >> +/* Don't scan more than max pages per batch. */ >> +static unsigned long ksm_advisor_max_pages = 30000; >> + >> +/* At least scan this many pages per batch. */ >> +static unsigned long ksm_advisor_min_pages = 500; >> + >> +/* Min CPU for scanning pages per scan */ >> +static unsigned int ksm_advisor_min_cpu = 10; > > That will never be modified, right? Either mark it const or just turn it > into a define. > Changed it to a define. > [...] > >> +/* >> + * The scan time advisor is based on the current scan rate and the target >> + * scan rate. >> + * >> + * new_pages_to_scan = pages_to_scan * (scan_time / target_scan_time) >> + * >> + * To avoid perturbations it calculates a change factor of previous changes. >> + * A new change factor is calculated for each iteration and it uses an >> + * exponentially weighted moving average. The new pages_to_scan value is >> + * multiplied with that change factor: >> + * >> + * new_pages_to_scan *= change facor >> + * >> + * The new_pages_to_scan value is limited by the cpu min and max values. It >> + * calculates the cpu percent for the last scan and calculates the new >> + * estimated cpu percent cost for the next scan. That value is capped by the >> + * cpu min and max setting. >> + * >> + * In addition the new pages_to_scan value is capped by the max and min >> + * limits. >> + */ >> +static void scan_time_advisor(void) >> +{ >> + unsigned int cpu_percent; >> + unsigned long cpu_time; >> + unsigned long cpu_time_diff; >> + unsigned long cpu_time_diff_ms; >> + unsigned long pages; >> + unsigned long per_page_cost; >> + unsigned long factor; >> + unsigned long change; >> + unsigned long last_scan_time; >> + unsigned long scan_time; >> + >> + /* Convert scan time to seconds */ >> + scan_time = div_s64(ktime_ms_delta(ktime_get(), advisor_ctx.start_scan), >> + MSEC_PER_SEC); >> + scan_time = scan_time ? scan_time : 1; >> + >> + /* Calculate CPU consumption of ksmd background thread */ >> + cpu_time = task_sched_runtime(current); >> + cpu_time_diff = cpu_time - advisor_ctx.cpu_time; >> + cpu_time_diff_ms = cpu_time_diff / 1000 / 1000; >> + >> + cpu_percent = (cpu_time_diff_ms * 100) / (scan_time * 1000); >> + cpu_percent = cpu_percent ? cpu_percent : 1; >> + last_scan_time = prev_scan_time(&advisor_ctx, scan_time); > > I'd simply inline prev_scan_time() here and get rid of it. Whatever you > think is best. > I think prev_scan_time is a bit more expressive. > > Acked-by: David Hildenbrand <david@xxxxxxxxxx> > > -- > Cheers, > > David / dhildenb