On 8/1/2022 2:54 am, Paolo Bonzini wrote:
From: Jing Liu <jing2.liu@xxxxxxxxx>
KVM_GET_SUPPORTED_CPUID should not include any dynamic xstates in
CPUID[0xD] if they have not been requested with prctl. Otherwise
a process which directly passes KVM_GET_SUPPORTED_CPUID to
KVM_SET_CPUID2 would now fail even if it doesn't intend to use a
dynamically enabled feature. Userspace must know that prctl is
required and allocate >4K xstate buffer before setting any dynamic
bit.
Suggested-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Signed-off-by: Jing Liu <jing2.liu@xxxxxxxxx>
Signed-off-by: Yang Zhong <yang.zhong@xxxxxxxxx>
Message-Id: <20220105123532.12586-5-yang.zhong@xxxxxxxxx>
Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
---
Documentation/virt/kvm/api.rst | 4 ++++
arch/x86/kvm/cpuid.c | 9 ++++++---
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 6b683dfea8f2..f4ea5e41a4d0 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1687,6 +1687,10 @@ userspace capabilities, and with user requirements (for example, the
user may wish to constrain cpuid to emulate older hardware, or for
feature consistency across a cluster).
+Dynamically-enabled feature bits need to be requested with
+``arch_prctl()`` before calling this ioctl. Feature bits that have not
+been requested are excluded from the result.
+
Note that certain capabilities, such as KVM_CAP_X86_DISABLE_EXITS, may
expose cpuid features (e.g. MONITOR) which are not supported by kvm in
its default configuration. If userspace enables such capabilities, it
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f3e6fda6b858..eb52dde5deec 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -815,11 +815,13 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
goto out;
}
break;
- case 0xd:
- entry->eax &= supported_xcr0;
+ case 0xd: {
+ u64 guest_perm = xstate_get_guest_group_perm();
+
+ entry->eax &= supported_xcr0 & guest_perm;
entry->ebx = xstate_required_size(supported_xcr0, false);
If we choose to exclude unpermitted xfeatures in the entry->eax, why do
we choose to expose the size of unpermitted xfeatures in ebx and ecx?
This seems to be an inconsistency, how about:
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1bd4d560cbdd..193cbf56a5fa 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -888,12 +888,12 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array
*array, u32 function)
}
break;
case 0xd: {
- u64 guest_perm = xstate_get_guest_group_perm();
+ u64 supported_xcr0 = supported_xcr0 & xstate_get_guest_group_perm();
- entry->eax &= supported_xcr0 & guest_perm;
+ entry->eax &= supported_xcr0;
entry->ebx = xstate_required_size(supported_xcr0, false);
entry->ecx = entry->ebx;
- entry->edx &= (supported_xcr0 & guest_perm) >> 32;
+ entry->edx &= supported_xcr0 >> 32;
if (!supported_xcr0)
break;
It also helps to fix the CPUID_D_1_EBX and later for (i = 2; i < 64; ++i);
Is there anything I've missed ?
entry->ecx = entry->ebx;
- entry->edx &= supported_xcr0 >> 32;
+ entry->edx &= (supported_xcr0 & guest_perm) >> 32;
if (!supported_xcr0)
break;
@@ -866,6 +868,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->edx = 0;
}
break;
+ }
case 0x12:
/* Intel SGX */
if (!kvm_cpu_cap_has(X86_FEATURE_SGX)) {