Re: [PATCH] KVM: VMX: Nop emulation of MSR_IA32_POWER_CTL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On 15 Apr 2019, at 21:17, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote:
> 
> On Mon, Apr 15, 2019 at 06:45:26PM +0300, Liran Alon wrote:
>> Since commits 668fffa3f838 ("kvm: better MWAIT emulation for guests”)
>> and 4d5422cea3b6 ("KVM: X86: Provide a capability to disable MWAIT intercepts”),
>> KVM was modified to allow an admin to configure certain guests to execute
>> MONITOR/MWAIT inside guest without being intercepted by host.
>> 
>> This is useful in case admin wishes to allocate a dedicated logical
>> processor for each vCPU thread. Thus, making it safe for guest to
>> completely control the power-state of the logical processor.
>> 
>> The ability to use this new KVM capability was introduced to QEMU by
>> commits 6f131f13e68d ("kvm: support -overcommit cpu-pm=on|off”) and
>> 2266d4431132 ("i386/cpu: make -cpu host support monitor/mwait”).
>> 
>> However, exposing MONITOR/MWAIT to a Linux guest may cause it's intel_idle
>                                                             ^^^^
>                                                             its
> 
> English is a wonderful language…

Yeah OK… :\

> 
>> kernel module to execute c1e_promotion_disable() which will attempt to
>> RDMSR/WRMSR from/to MSR_IA32_POWER_CTL to manipulate the "C1E Enable"
>> bit. This behaviour was introduced by commit
>> 32e9518005c8 ("intel_idle: export both C1 and C1E”).
> 
> Technically, I think this is a Qemu bug.  KVM reports all zeros for
> CPUID_MWAIT_LEAF when userspace queries KVM_GET_SUPPORTED_CPUID and
> KVM_GET_EMULATED_CPUID.  And I think that's correct/desired, supporting
> MONITOR/MWAIT sub-features should be a separate enabling patch set.

At some point in time Jim added commit df9cb9cc5bcd ("kvm: x86: Amend the KVM_GET_SUPPORTED_CPUID API documentation”)
which added the following paragraph to documentation:
"Note that certain capabilities, such as KVM_CAP_X86_DISABLE_EXITS, may
expose cpuid features (e.g. MONITOR) which are not supported by kvm in
its default configuration. If userspace enables such capabilities, it
is responsible for modifying the results of this ioctl appropriately.”

It’s indeed not clear what it means to “modify the results of this ioctl *appropriately*” right?
It can either mean you just expose in CPUID[EAX=1].ECX support for MONITOR/MWAIT
or that you also expose CPUID_MWAIT_LEAF (CPUID[EAX=5]).
Both regardless of the value returned from KVM_GET_SUPPORTED_CPUID ioctl.

Having said that, I tend to agree with you.
Instead of emulating this MSR in KVM, I think it it preferred to change QEMU to expose MONITOR/MWAIT support in CPUID[EAX=1].ECX
but in CPUID[EAX=5] init everything as in host besides ECX[0] which will be set to 0 to report we don’t support extensions.
(We still want to support range of monitor line size, whether we can treat interrupts as break-events for MWAIT and the supported C substates).
I will create this patch for QEMU.

But in addition, this MSR emulation I think is still required. As guests may still legitimately use this MSR to enable C1E
even when MWAIT extensions is not supported. This is because just executing HLT/MWAIT on all cores in package allows
auto transition to C1E.

So to conclude I think this KVM patch is still required and that we need a QEMU patch in addition.
Do you agree?

> 
> Note, there is a virtualization hole regarding MWAIT as KVM can't
> intercept MWAIT when executed with unsupported hints/features, but
> I don't think that absolves Qemu of wrongdoing.

I think that’s OK as this will just cause the logical processor to ignore MWAIT extension register
Specifying Sub C-state and target C-state. The reason this is OK is because in SDM on MWAIT it says:
"Implementation-specific conditions may cause a processor to ignore the hint and enter a different optimized state”.
So our implementation-specific condition is to always ignore the hint & extension :D

> 
>> Becuase KVM doesn't emulate this MSR, running KVM with ignore_msrs=0
>> will cause the above guest behaviour to raise a #GP which will cause
>> guest to kernel panic.
>> 
>> Therefore, add support for nop emulation of MSR_IA32_POWER_CTL to
>> avoid #GP in guest in this scenario.
>> 
>> Future commits can optimise emulation further by reflecting guest
>> MSR changes to host MSR to provide guest with the ability to
>> fine-tune the dedicated logical processor power-state.
>> 
>> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
>> Signed-off-by: Liran Alon <liran.alon@xxxxxxxxxx>
>> ---
>> arch/x86/kvm/vmx/vmx.c | 6 ++++++
>> arch/x86/kvm/vmx/vmx.h | 2 ++
>> arch/x86/kvm/x86.c     | 1 +
>> 3 files changed, 9 insertions(+)
>> 
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 2634ee8c9dc8..6246d782b746 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -1696,6 +1696,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> 	case MSR_IA32_SYSENTER_ESP:
>> 		msr_info->data = vmcs_readl(GUEST_SYSENTER_ESP);
>> 		break;
>> +	case MSR_IA32_POWER_CTL:
>> +		msr_info->data = vmx->msr_ia32_power_ctl;
>> +		break;
>> 	case MSR_IA32_BNDCFGS:
>> 		if (!kvm_mpx_supported() ||
>> 		    (!msr_info->host_initiated &&
>> @@ -1826,6 +1829,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> 	case MSR_IA32_SYSENTER_ESP:
>> 		vmcs_writel(GUEST_SYSENTER_ESP, data);
>> 		break;
>> +	case MSR_IA32_POWER_CTL:
>> +		vmx->msr_ia32_power_ctl = data;
>> +		break;
>> 	case MSR_IA32_BNDCFGS:
>> 		if (!kvm_mpx_supported() ||
>> 		    (!msr_info->host_initiated &&
> 
> If KVM does go the route of advertising MWAIT/MONITOR sub-features, then I
> think the MSR needs to be emulated on both Intel and AMD.  Glancing through
> drivers/idle/intel_idle.c, I don't see anything that prevents it from
> successfully probing an "Intel" vCPU that is being emulated on AMD hardware.

Do we even support exposing “Intel” vCPU which is emulated using AMD SVM?
In that case, I agree and I will just move code to kvm_set_msr_common() & kvm_get_msr_common().

Thanks for the insightful review,
-Liran

> 
>> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
>> index 99328954c2fc..e9772850a2a1 100644
>> --- a/arch/x86/kvm/vmx/vmx.h
>> +++ b/arch/x86/kvm/vmx/vmx.h
>> @@ -259,6 +259,8 @@ struct vcpu_vmx {
>> 
>> 	unsigned long host_debugctlmsr;
>> 
>> +	u64 msr_ia32_power_ctl;
>> +
>> 	/*
>> 	 * Only bits masked by msr_ia32_feature_control_valid_bits can be set in
>> 	 * msr_ia32_feature_control. FEATURE_CONTROL_LOCKED is always included
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 02c8e095a239..39ee4087f954 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1167,6 +1167,7 @@ static u32 emulated_msrs[] = {
>> 	MSR_PLATFORM_INFO,
>> 	MSR_MISC_FEATURES_ENABLES,
>> 	MSR_AMD64_VIRT_SPEC_CTRL,
>> +	MSR_IA32_POWER_CTL,
>> };
>> 
>> static unsigned num_emulated_msrs;
>> -- 
>> 2.20.1
>> 





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux