On 8/11/2021 4:27 pm, Liuxiangdong wrote:
On 2021/11/8 12:11, Like Xu wrote:
On 8/11/2021 12:07 pm, Liuxiangdong wrote:
On 2021/11/8 11:06, Like Xu wrote:
On 7/11/2021 6:14 pm, Liuxiangdong wrote:
Hi, like and lingshan.
As said, IA32_MISC_ENABLE[7] bit depends on the PMU is enabled for the
guest, so a software
write openration to this bit will be ignored.
But, in this patch, all the openration that writes msr_ia32_misc_enable in
guest could make this bit become 0.
Suppose:
When we start vm with "enable_pmu", vcpu->arch.ia32_misc_enable_msr may be
0x80 first.
And next, guest writes msr_ia32_misc_enable value 0x1.
What we want could be 0x81, but unfortunately, it will be 0x1 because of
"data &= ~MSR_IA32_MISC_ENABLE_EMON;"
And even if guest writes msr_ia32_misc_enable value 0x81, it will be 0x1 also.
Yes and thank you. The fix has been committed on my private tree for a long
time.
What we want is write operation will not change this bit. So, how about this?
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3321,6 +3321,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct
msr_data *msr_info)
}
break;
case MSR_IA32_MISC_ENABLE:
+ data &= ~MSR_IA32_MISC_ENABLE_EMON;
+ data |= (vcpu->arch.ia32_misc_enable_msr &
MSR_IA32_MISC_ENABLE_EMON);
if (!kvm_check_has_quirk(vcpu->kvm,
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
((vcpu->arch.ia32_misc_enable_msr ^ data) &
MSR_IA32_MISC_ENABLE_MWAIT)) {
if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
How about this for the final state considering PEBS enabling:
case MSR_IA32_MISC_ENABLE: {
u64 old_val = vcpu->arch.ia32_misc_enable_msr;
u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
MSR_IA32_MISC_ENABLE_EMON;
u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
MSR_IA32_MISC_ENABLE_EMON;
Repetitive "MSR_IA32_MISC_ENABLE_EMON" ?
Oops,
u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL;
Yes. bit[12] is also read-only, so we can keep this bit unchanged also.
And, because write operation will not change this bit by "pmu_mask", do we still
need this if statement?
/* RO bits */
if (!msr_info->host_initiated &&
((old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL))
return 1;
"(old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL" means some operation
tries to change this bit,
so we cannot allow it.
But, if there is no this judgement, "pmu_mask" will still make this bit[12] no
change.
The only difference is that we can not change other bit (except bit 12 and bit
7) once "old_val[12] != data[12]" if there exists this statement
and we can change other bit if there is no judgement.
For both MSR_IA32_MISC_ENABLE_EMON and MSR_IA32_MISC_ENABLE_EMON are read-only,
maybe we can keep
their behavioral consistency. Either both judge, or neither.
One more difference per Intel SDM, I assume:
For Bit 7, Performance Monitoring Available (R)
(R) means that attempts to change this bit will be silent;
For Bit 12, Processor Event Based Sampling (PEBS) Unavailable (RO),
(RO) means that attempts to change this bit will be #GP;
Do you think so?
I'll send the fix after sync with Lingshan.
/* RO bits */
if (!msr_info->host_initiated &&
((old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL))
return 1;
/*
* For a dummy user space, the order of setting vPMU capabilities and
* initialising MSR_IA32_MISC_ENABLE is not strictly guaranteed, so to
* avoid inconsistent functionality we keep the vPMU bits unchanged
here.
*/
Yes. It's a little clearer with comments.
Thanks for your feedback! Enjoy the feature.
data &= ~pmu_mask;
data |= old_val & pmu_mask;
if (!kvm_check_has_quirk(vcpu->kvm,
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
((old_val ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
return 1;
vcpu->arch.ia32_misc_enable_msr = data;
kvm_update_cpuid_runtime(vcpu);
} else {
vcpu->arch.ia32_misc_enable_msr = data;
}
break;
}
Or is there anything in your design intention I don't understand?
Thanks!
Xiangdong Liu
On 2021/8/6 21:37, Zhu Lingshan wrote:
From: Like Xu <like.xu@xxxxxxxxxxxxxxx>
On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to
detect whether the processor supports performance monitoring facility.
It depends on the PMU is enabled for the guest, and a software write
operation to this available bit will be ignored. The proposal to ignore
the toggle in KVM is the way to go and that behavior matches bare metal.
Cc: Yao Yuan <yuan.yao@xxxxxxxxx>
Signed-off-by: Like Xu <like.xu@xxxxxxxxxxxxxxx>
Reviewed-by: Venkatesh Srinivas <venkateshs@xxxxxxxxxxxx>
Signed-off-by: Zhu Lingshan <lingshan.zhu@xxxxxxxxx>
Acked-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
---
arch/x86/kvm/vmx/pmu_intel.c | 1 +
arch/x86/kvm/x86.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 9efc1a6b8693..d9dbebe03cae 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -488,6 +488,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
if (!pmu->version)
return;
+ vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_EMON;
perf_get_x86_pmu_capability(&x86_pmu);
pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index efd11702465c..f6b6984e26ef 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3321,6 +3321,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct
msr_data *msr_info)
}
break;
case MSR_IA32_MISC_ENABLE:
+ data &= ~MSR_IA32_MISC_ENABLE_EMON;
if (!kvm_check_has_quirk(vcpu->kvm,
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
((vcpu->arch.ia32_misc_enable_msr ^ data) &
MSR_IA32_MISC_ENABLE_MWAIT)) {
if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))