On Wed, 5 Jun 2019 at 00:53, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > On 21/05/19 08:06, Wanpeng Li wrote: > > From: Wanpeng Li <wanpengli@xxxxxxxxxxx> > > > > Allow guest reads CORE cstate when exposing host CPU power management capabilities > > to the guest. PKG cstate is restricted to avoid a guest to get the whole package > > information in multi-tenant scenario. > > > > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > > Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx> > > Cc: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > Cc: Liran Alon <liran.alon@xxxxxxxxxx> > > Signed-off-by: Wanpeng Li <wanpengli@xxxxxxxxxxx> > > --- > > v1 -> v2: > > * use a separate bit for KVM_CAP_X86_DISABLE_EXITS > > > > Documentation/virtual/kvm/api.txt | 1 + > > arch/x86/include/asm/kvm_host.h | 1 + > > arch/x86/kvm/vmx/vmx.c | 6 ++++++ > > arch/x86/kvm/x86.c | 5 ++++- > > arch/x86/kvm/x86.h | 5 +++++ > > include/uapi/linux/kvm.h | 4 +++- > > tools/include/uapi/linux/kvm.h | 4 +++- > > 7 files changed, 23 insertions(+), 3 deletions(-) > > > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > > index 33cd92d..91fd86f 100644 > > --- a/Documentation/virtual/kvm/api.txt > > +++ b/Documentation/virtual/kvm/api.txt > > @@ -4894,6 +4894,7 @@ Valid bits in args[0] are > > #define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0) > > #define KVM_X86_DISABLE_EXITS_HLT (1 << 1) > > #define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2) > > +#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3) > > > > Enabling this capability on a VM provides userspace with a way to no > > longer intercept some instructions for improved latency in some > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > index d5457c7..1ce8289 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -882,6 +882,7 @@ struct kvm_arch { > > bool mwait_in_guest; > > bool hlt_in_guest; > > bool pause_in_guest; > > + bool cstate_in_guest; > > > > unsigned long irq_sources_bitmap; > > s64 kvmclock_offset; > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > index 0861c71..da24f18 100644 > > --- a/arch/x86/kvm/vmx/vmx.c > > +++ b/arch/x86/kvm/vmx/vmx.c > > @@ -6637,6 +6637,12 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) > > vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW); > > vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW); > > vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW); > > + if (kvm_cstate_in_guest(kvm)) { > > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C1_RES, MSR_TYPE_R); > > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R); > > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R); > > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R); > > I think I have changed my mind on the implementation of this, sorry. > > 1) We should emulate these MSRs always, otherwise the guest API changes > between different values of KVM_CAP_X86_DISABLE_EXITS which is not > intended. Also, KVM_CAP_X86_DISABLE_EXITS does not prevent live > migration, so it should be possible to set the MSRs in the host to > change the delta between the host and guest values. > > 2) If both KVM_X86_DISABLE_EXITS_HLT and KVM_X86_DISABLE_EXITS_MWAIT are > disabled (i.e. exit happens), the MSRs will be purely emulated. > C3/C6/C7 residency will never increase (it will remain the value that is > set by the host). When the VM executes an hlt vmexit, it should save > the current TSC. When it comes back, the C1 residency MSR should be > increased by the time that has passed. > > 3) If KVM_X86_DISABLE_EXITS_HLT is enabled but > KVM_X86_DISABLE_EXITS_MWAIT is disabled (i.e. mait exits happen), > C3/C6/C7 residency will also never increase, but the C1 residency value > should be read using rdmsr from the host, with a delta added from the > host value. > > 4) If KVM_X86_DISABLE_EXITS_HLT and KVM_X86_DISABLE_EXITS_MWAIT are both > disabled (i.e. mwait exits do not happen), all four residency values > should be read using rdmsr from the host, with a delta added from the > host value. > > 5) If KVM_X86_DISABLE_EXITS_HLT is disabled and > KVM_X86_DISABLE_EXITS_MWAIT is enabled, the configuration makes no sense > so it's okay not to be very optimized. In this case, the residency > value should be read as in (4), but hlt vmexits will be accounted as in > (2) so we need to be careful not to double-count the residency during > hlt. This means doing four rdmsr before the beginning of the hlt vmexit > and four at the end of the hlt vmexit. MSR_CORE_C1_RES is unreadable except for ATOM platform, so I think we can avoid the complex logic to handle C1 now. :) Regards, Wanpeng Li > > Therefore the data structure should be something like > > struct kvm_residency_msr { > u64 value; > bool delta_from_host; > bool count_with_host; > } > > u64 kvm_residency_read_host(struct kvm_residency_msr *msr) > { > u64 unscaled_value = rdmsrl(msr->index); > // apply TSC scaling... > return ... > } > > u64 kvm_residency_read(struct kvm_residency_msr *msr) > { > return msr->value + > (msr->delta_from_host ? kvm_residency_read_host(msr) : 0); > } > > void kvm_residency_write(struct kvm_residency_msr *msr, > u64 value) > { > msr->value = value - > (msr->delta_from_host ? kvm_residency_read_host(msr) : 0); > } > > // count_with_host is true for C1 iff any of KVM_CAP_DISABLE_EXITS_HLT > // or KVM_CAP_DISABLE_EXITS_MWAIT is set > // count_with_host is true for C3/C6/C7 iff KVM_CAP_DISABLE_EXITS_MWAIT > is set > void kvm_residency_setup(struct kvm_residency_msr *msr, u16 index, > bool count_with_host) > { > /* Preserve value on calls after the first */ > u64 value = msr->index ? kvm_residency_read(msr) : 0; > msr->delta_from_host = msr->count_with_host = count_with_host; > msr->index = index; > kvm_residency_write(msr, value); > } > > // The following functions are called from hlt vmexits. > > void kvm_residency_start_hlt(struct kvm_residency_msr *msr) > { > if (msr->count_with_host) { > WARN_ON(msr->delta_from_host); > msr->value += kvm_residency_read_host(msr); > msr->delta_from_host = false; > } > } > > // host_tsc_waited is 0 except for MSR_CORE_C1_RES > void kvm_residency_end_hlt(struct kvm_residency_msr *msr, > u64 host_tsc_waited) > { > if (msr->count_with_host) { > WARN_ON(!msr->delta_from_host); > msr->value -= kvm_residency_read_host(msr); > msr->delta_from_host = true; > } > if (host_tsc_waited) { > // ... apply TSC scaling to host_tsc_waited ... > msr->value += ...; > } > } > > Thanks, > > Paolo