Re: [PATCH 1/2] x86/idle: add halt poll for halt idle

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 22, 2017 at 11:22:13AM +0000, root wrote:
> From: Yang Zhang <yang.zhang.wz@xxxxxxxxx>
> 
> This patch introduce a new mechanism to poll for a while before
> entering idle state.
> 
> David has a topic in KVM forum to describe the problem on current KVM VM
> when running some message passing workload in KVM forum. Also, there
> are some work to improve the performance in KVM, like halt polling in KVM.
> But we still has 4 MSR wirtes and HLT vmexit when going into halt idle
> which introduce lot of latency.
> 
> Halt polling in KVM provide the capbility to not schedule out VCPU when
> it is the only task in this pCPU. Unlike it, this patch will let VCPU polls
> for a while if there is no work inside VCPU to elimiate heavy vmexit during
> in/out idle. The potential impact is it will cost more CPU cycle since we
> are doing polling and may impact other task which waiting on the same
> physical CPU in host.

I wonder whether you considered doing this in an idle driver.
I have a prototype patch combining this with mwait within guest -
I can post it if you are interested.


> Here is the data i get when running benchmark contextswitch
> (https://github.com/tsuna/contextswitch)
> 
> before patch:
> 2000000 process context switches in 4822613801ns (2411.3ns/ctxsw)
> 
> after patch:
> 2000000 process context switches in 3584098241ns (1792.0ns/ctxsw)
> 
> Signed-off-by: Yang Zhang <yang.zhang.wz@xxxxxxxxx>
> ---
>  Documentation/sysctl/kernel.txt | 10 ++++++++++
>  arch/x86/kernel/process.c       | 21 +++++++++++++++++++++
>  include/linux/kernel.h          |  3 +++
>  kernel/sched/idle.c             |  3 +++
>  kernel/sysctl.c                 |  9 +++++++++
>  5 files changed, 46 insertions(+)
> 
> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
> index bac23c1..4e71bfe 100644
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -63,6 +63,7 @@ show up in /proc/sys/kernel:
>  - perf_event_max_stack
>  - perf_event_max_contexts_per_stack
>  - pid_max
> +- poll_threshold_ns        [ X86 only ]
>  - powersave-nap               [ PPC only ]
>  - printk
>  - printk_delay
> @@ -702,6 +703,15 @@ kernel tries to allocate a number starting from this one.
>  
>  ==============================================================
>  
> +poll_threshold_ns: (X86 only)
> +
> +This parameter used to control the max wait time to poll before going
> +into real idle state. By default, the values is 0 means don't poll.
> +It is recommended to change the value to non-zero if running latency-bound
> +workloads in VM.
> +
> +==============================================================
> +
>  powersave-nap: (PPC only)
>  
>  If set, Linux-PPC will use the 'nap' mode of powersaving,
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 0bb8842..6361783 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -39,6 +39,10 @@
>  #include <asm/desc.h>
>  #include <asm/prctl.h>
>  
> +#ifdef CONFIG_HYPERVISOR_GUEST
> +unsigned long poll_threshold_ns;
> +#endif
> +
>  /*
>   * per-CPU TSS segments. Threads are completely 'soft' on Linux,
>   * no more per-task TSS's. The TSS size is kept cacheline-aligned
> @@ -313,6 +317,23 @@ static inline void play_dead(void)
>  }
>  #endif
>  
> +#ifdef CONFIG_HYPERVISOR_GUEST
> +void arch_cpu_idle_poll(void)
> +{
> +	ktime_t start, cur, stop;
> +
> +	if (poll_threshold_ns) {
> +		start = cur = ktime_get();
> +		stop = ktime_add_ns(ktime_get(), poll_threshold_ns);
> +		do {
> +			if (need_resched())
> +				break;
> +			cur = ktime_get();
> +		} while (ktime_before(cur, stop));
> +	}
> +}
> +#endif
> +
>  void arch_cpu_idle_enter(void)
>  {
>  	tsc_verify_tsc_adjust(false);
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 13bc08a..04cf774 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -460,6 +460,9 @@ extern __scanf(2, 0)
>  extern int sysctl_panic_on_stackoverflow;
>  
>  extern bool crash_kexec_post_notifiers;
> +#ifdef CONFIG_HYPERVISOR_GUEST
> +extern unsigned long poll_threshold_ns;
> +#endif
>  
>  /*
>   * panic_cpu is used for synchronizing panic() and crash_kexec() execution. It
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 2a25a9e..e789f99 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -74,6 +74,7 @@ static noinline int __cpuidle cpu_idle_poll(void)
>  }
>  
>  /* Weak implementations for optional arch specific functions */
> +void __weak arch_cpu_idle_poll(void) { }
>  void __weak arch_cpu_idle_prepare(void) { }
>  void __weak arch_cpu_idle_enter(void) { }
>  void __weak arch_cpu_idle_exit(void) { }
> @@ -219,6 +220,8 @@ static void do_idle(void)
>  	 */
>  
>  	__current_set_polling();
> +	arch_cpu_idle_poll();
> +
>  	tick_nohz_idle_enter();
>  
>  	while (!need_resched()) {
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 4dfba1a..9174d57 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1203,6 +1203,15 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write,
>  		.extra2		= &one,
>  	},
>  #endif
> +#ifdef CONFIG_HYPERVISOR_GUEST
> +	{
> +		.procname	= "halt_poll_threshold",
> +		.data		= &poll_threshold_ns,
> +		.maxlen		= sizeof(unsigned long),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec,
> +	},
> +#endif
>  	{ }
>  };
>  
> -- 
> 1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux