Re: [PATCH] kvm-vmx: add module parameter to avoid trapping HLT instructions (v2)

Anthony Liguori <anthony@xxxxxxxxxxxxx> · Thu, 02 Dec 2010 09:23:43 -0600

On 12/02/2010 08:39 AM, lidong chen wrote:
In certain use-cases, we want to allocate guests fixed time slices where idle
guest cycles leave the machine idling.

i could not understand why need this? can you tell more detailedly?
   

If you run 4 guests on a CPU, and they're all trying to consume 100% 
CPU, all things being equal, you'll get ~25% CPU for each guest.

However, if one guest is idle, you'll get something like 1% 32% 33% 
32%.  This characteristic is usually desirable because it increase 
aggregate throughput but in some circumstances, determinism is more 
desirable than aggregate throughput.

This patch essentially makes guest execution non-work conserving by 
making it appear to the scheduler that each guest wants 100% CPU even 
though they may be idling.

That means that regardless of what each guest is doing, if you have four 
guests on one CPU, each will get ~25% CPU[1].

[1] there are corner cases around things like forced sleep due to PFs 
and the like.  The goal is not for 100% determinism but more to at least 
obtain more significantly more determinism than we have now.

Regards,

Anthony Liguori

thanks.


2010/12/2 Anthony Liguori<aliguori@xxxxxxxxxx>:
   
In certain use-cases, we want to allocate guests fixed time slices where idle
guest cycles leave the machine idling.  There are many approaches to achieve
this but the most direct is to simply avoid trapping the HLT instruction which
lets the guest directly execute the instruction putting the processor to sleep.

Introduce this as a module-level option for kvm-vmx.ko since if you do this
for one guest, you probably want to do it for all.  A similar option is possible
for AMD but I don't have easy access to AMD test hardware.

Signed-off-by: Anthony Liguori<aliguori@xxxxxxxxxx>
---
v1 ->  v2
  - Rename parameter to yield_on_hlt
  - Remove __read_mostly

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index caa967e..d8310e4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -69,6 +69,9 @@ module_param(emulate_invalid_guest_state, bool, S_IRUGO);
  static int __read_mostly vmm_exclusive = 1;
  module_param(vmm_exclusive, bool, S_IRUGO);

+static int yield_on_hlt = 1;
+module_param(yield_on_hlt, bool, S_IRUGO);
+
  #define KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST                          \
        (X86_CR0_WP | X86_CR0_NE | X86_CR0_NW | X86_CR0_CD)
  #define KVM_GUEST_CR0_MASK                                             \
@@ -1419,7 +1422,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
                                &_pin_based_exec_control)<  0)
                return -EIO;

-       min = CPU_BASED_HLT_EXITING |
+       min =
  #ifdef CONFIG_X86_64
              CPU_BASED_CR8_LOAD_EXITING |
              CPU_BASED_CR8_STORE_EXITING |
@@ -1432,6 +1435,10 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
              CPU_BASED_MWAIT_EXITING |
              CPU_BASED_MONITOR_EXITING |
              CPU_BASED_INVLPG_EXITING;
+
+       if (yield_on_hlt)
+               min |= CPU_BASED_HLT_EXITING;
+
        opt = CPU_BASED_TPR_SHADOW |
              CPU_BASED_USE_MSR_BITMAPS |
              CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
--
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

     
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html