Sorry for holding you up, I'm slow. and needed to think about this more, On Wed, 20 Aug 2014, Chintan Pandya wrote: > KSM thread to scan pages is scheduled on definite timeout. That wakes up > CPU from idle state and hence may affect the power consumption. Provide > an optional support to use deferrable timer which suites low-power > use-cases. > > Typically, on our setup we observed, 10% less power consumption with some > use-cases in which CPU goes to power collapse frequently. For example, > playing audio on Soc which has HW based Audio encoder/decoder, CPU > remains idle for longer duration of time. This idle state will save > significant CPU power consumption if KSM don't wakes them up > periodically. > > Note that, deferrable timers won't be deferred if any CPU is active and > not in IDLE state. > > By default, deferrable timers is enabled. To disable deferrable timers, > $ echo 0 > /sys/kernel/mm/ksm/deferrable_timer I have now experimented. And, much as I wanted to eliminate the tunable, and just have deferrable timers on, I have come right back to your original position. I was impressed by how quiet ksmd goes when there's nothing much happening on the machine; but equally, disappointed in how slow it then is to fulfil the outstanding merge work. I agree with your original assessment, that not everybody will want deferrable timer, the way it is working at present. I expect that can be fixed, partly by doing more work on wakeup from a deferred timer, according to how long it has been deferred; and partly by not deferring on idle until two passes of the list have been completed. But that's easier said than done, and might turn out to defeat deferring the timer in too many cases: a balance to be found. I hope that you or I or another will find time to do that work soon, maybe before 3.18 but likely not; but I think the advantage of your option is too important to delay it further. Once we are satisfied with later improvements, then I would like to remove the tunable, or at least default it to on. But for now, the default should be off. It's unclear whether I should still worry about ksmd's gross activity, when the system is not actually idle, but all the activity is in non- mergeable processes. I'm ashamed of that hyper-activity, and still think that fixing it would be better than using a deferrable timer - deferring work (particularly intensive work of the kind which ksmd does) until the system is otherwise busy, is not necessarily a good strategy (too much work to do all at once). But fixing that might require ksm hooks in hot locations where nobody else would want them: I'm rather hoping we can strike a good enough balance with your deferrable timer, that nobody will need any better. So, with a few changes here and below, please add my Acked-by: Hugh Dickins <hughd@xxxxxxxxxx> to patches 1 and 2, and resend to akpm - thank you! Here (above), it's restore the text to V3's To enable deferrable timer, $ echo 1 > /sys/kernel/mm/ksm/deferrable_timer > > Signed-off-by: Chintan Pandya <cpandya@xxxxxxxxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: John Stultz <john.stultz@xxxxxxxxxx> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > --- > Changes: Sorry, I'm asking for a V4-->V5 reversing V3-->V4! > > V3-->V4: > - Use deferrable timers by default > > V2-->V3: > - Handled error case properly > - Corrected indentation in Documentation > - Fixed build failure > - Removed left over process_timeout() > V1-->V2: > - allowing only valid values to be updated as use_deferrable_timer > - using only 'deferrable' and not 'deferred' > - moved out schedule_timeout code for deferrable timer into timer.c > > > Documentation/vm/ksm.txt | 6 ++++++ > mm/ksm.c | 36 ++++++++++++++++++++++++++++++++++-- > 2 files changed, 40 insertions(+), 2 deletions(-) > > diff --git a/Documentation/vm/ksm.txt b/Documentation/vm/ksm.txt > index f34a8ee..23e26c3 100644 > --- a/Documentation/vm/ksm.txt > +++ b/Documentation/vm/ksm.txt > @@ -87,6 +87,12 @@ pages_sharing - how many more sites are sharing them i.e. how much saved > pages_unshared - how many pages unique but repeatedly checked for merging > pages_volatile - how many pages changing too fast to be placed in a tree > full_scans - how many times all mergeable areas have been scanned > +deferrable_timer - whether to use deferrable timers or not > + e.g. "echo 1 > /sys/kernel/mm/ksm/deferrable_timer" > + Default: 1 (means, we are using deferrable timers. Users > + might want to clear deferrable_timer option if they want > + ksm thread to wakeup CPU to carryout ksm activities thus > + loosing on battery while gaining on memory savings.) Please move this section to between the "sleep_millisecs" and "merge_across_nodes" descriptions, separated from each by a blank line: deferrable_timer - whether to save power by using a deferrable timer e.g. "echo 1 > /sys/kernel/mm/ksm/deferrable_timer" If set to 1, saves power by letting ksmd sleep for longer than sleep_millisecs whenever the system is idle, extending battery life but sometimes saving less memory. Default: 0 (strict sleep timer as in earlier releases) Warning: this default is likely to be changed to 1, and deferrable_timer file then removed, in future releases. > > A high ratio of pages_sharing to pages_shared indicates good sharing, but > a high ratio of pages_unshared to pages_sharing indicates wasted effort. > diff --git a/mm/ksm.c b/mm/ksm.c > index fb75902..af90e30 100644 > --- a/mm/ksm.c > +++ b/mm/ksm.c > @@ -223,6 +223,9 @@ static unsigned int ksm_thread_pages_to_scan = 100; > /* Milliseconds ksmd should sleep between batches */ > static unsigned int ksm_thread_sleep_millisecs = 20; > > +/* Boolean to indicate whether to use deferrable timer or not */ > +static bool use_deferrable_timer = 1; s/ = 1// > + > #ifdef CONFIG_NUMA > /* Zeroed when merging across nodes is not allowed */ > static unsigned int ksm_merge_across_nodes = 1; > @@ -1725,8 +1728,13 @@ static int ksm_scan_thread(void *nothing) > try_to_freeze(); > > if (ksmd_should_run()) { > - schedule_timeout_interruptible( > - msecs_to_jiffies(ksm_thread_sleep_millisecs)); > + signed long to; > + > + to = msecs_to_jiffies(ksm_thread_sleep_millisecs); > + if (use_deferrable_timer) > + schedule_timeout_deferrable_interruptible(to); > + else > + schedule_timeout_interruptible(to); > } else { > wait_event_freezable(ksm_thread_wait, > ksmd_should_run() || kthread_should_stop()); > @@ -2175,6 +2183,29 @@ static ssize_t run_store(struct kobject *kobj, struct kobj_attribute *attr, > } > KSM_ATTR(run); > > +static ssize_t deferrable_timer_show(struct kobject *kobj, > + struct kobj_attribute *attr, char *buf) > +{ > + return snprintf(buf, 8, "%d\n", use_deferrable_timer); I wonder what that "8" was for: return sprintf(buf, "%u\n", use_deferrable_timer); > +} > + > +static ssize_t deferrable_timer_store(struct kobject *kobj, > + struct kobj_attribute *attr, > + const char *buf, size_t count) > +{ > + unsigned long enable; > + int err; > + > + err = kstrtoul(buf, 10, &enable); > + if (err < 0) > + return err; > + if (enable >= 1) It seems you misunderstood me, and never tested $ echo 1 >/sys/kernel/mm/ksm/deferrable_timer bash: echo: write error: Invalid argument s/>=/>/ Or better, follow the rest of ksm.c's parsing, just if (err || enable > 1) return -EINVAL; Thanks, Hugh > + return -EINVAL; > + use_deferrable_timer = enable; > + return count; > +} > +KSM_ATTR(deferrable_timer); > + > #ifdef CONFIG_NUMA > static ssize_t merge_across_nodes_show(struct kobject *kobj, > struct kobj_attribute *attr, char *buf) > @@ -2287,6 +2318,7 @@ static struct attribute *ksm_attrs[] = { > &pages_unshared_attr.attr, > &pages_volatile_attr.attr, > &full_scans_attr.attr, > + &deferrable_timer_attr.attr, > #ifdef CONFIG_NUMA > &merge_across_nodes_attr.attr, > #endif > -- > Chintan Pandya > > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a > member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html