On 3/3/25 02:23, Yan Zhao wrote:
On Sat, Mar 01, 2025 at 02:34:26AM -0500, Paolo Bonzini wrote:
From: Yan Zhao <yan.y.zhao@xxxxxxxxx>
Introduce supported_quirks in kvm_caps to store platform-specific force-enabled
quirks. Any quirk removed from kvm_caps.supported_quirks will never be
included in kvm->arch.disabled_quirks, and will cause the ioctl to fail if
passed to KVM_ENABLE_CAP(KVM_CAP_DISABLE_QUIRKS2).
Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx>
Message-ID: <20250224070832.31394-1-yan.y.zhao@xxxxxxxxx>
Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
---
arch/x86/kvm/x86.c | 7 ++++---
arch/x86/kvm/x86.h | 2 ++
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fd0a44e59314..a97e58916b6a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4782,7 +4782,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = enable_pmu ? KVM_CAP_PMU_VALID_MASK : 0;
break;
case KVM_CAP_DISABLE_QUIRKS2:
- r = KVM_X86_VALID_QUIRKS;
+ r = kvm_caps.supported_quirks;
As the concern raised in [1], it's confusing for
KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT to be present on AMD's platforms while not
present on Intel's non-self-snoop platforms.
To make it less confusing, let's rename it to
KVM_X86_QUIRK_IGNORE_GUEST_PAT. So if userspace wants to say "I don't
want KVM to ignore guest's PAT!", it can do so easily:
- it can check unconditionally that the quirk is included in
KVM_CAP_DISABLE_QUIRKS2, and it will pass on both Intel with self-snoop
with AMD;
- it can pass it unconditionally to KVM_X86_QUIRK_IGNORE_GUEST_PAT,
knowing that PAT will be honored.
But KVM_CHECK_EXTENSION(KVM_CAP_DISABLE_QUIRKS2) will *not* include the
quirk on Intel without self-snoop, which lets userspace detect the
condition and raise an error. This is better than introducing a new
case in the API "the bit is returned by KVM_CHECK_EXTENSION, but
rejected by KVM_ENABLE_CAP". Such a new case would inevitably lead to
KVM_CAP_DISABLE_QUIRKS3. :)
Or what about introduce kvm_caps.force_enabled_quirk?
if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
kvm_caps.force_enabled_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;
That would also make it harder for userspace to understand what's going on.
[1] https://lore.kernel.org/all/Z8UBpC76CyxCIRiU@xxxxxxxxxxxxxxxxxxxxxxxxx/
break;
case KVM_CAP_X86_NOTIFY_VMEXIT:
r = kvm_caps.has_notify_vmexit;
@@ -6521,11 +6521,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
switch (cap->cap) {
case KVM_CAP_DISABLE_QUIRKS2:
r = -EINVAL;
- if (cap->args[0] & ~KVM_X86_VALID_QUIRKS)
+ if (cap->args[0] & ~kvm_caps.supported_quirks)
break;
fallthrough;
case KVM_CAP_DISABLE_QUIRKS:
- kvm->arch.disabled_quirks = cap->args[0];
+ kvm->arch.disabled_quirks = cap->args[0] & kvm_caps.supported_quirks;
Will this break the uapi of KVM_CAP_DISABLE_QUIRKS?
My understanding is that only KVM_CAP_DISABLE_QUIRKS2 filters out invalid
quirks.
The difference between KVM_CAP_DISABLE_QUIRKS and
KVM_CAP_DISABLE_QUIRKS2 is only that invalid values do not cause an
error; but userspace cannot know what is in kvm->arch.disabled_quirks,
so KVM can change the value that is stored there.
Paolo
r = 0;
break;
case KVM_CAP_SPLIT_IRQCHIP: {
@@ -9775,6 +9775,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0;
}
+ kvm_caps.supported_quirks = KVM_X86_VALID_QUIRKS;
kvm_caps.inapplicable_quirks = 0;
rdmsrl_safe(MSR_EFER, &kvm_host.efer);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 9af199c8e5c8..f2672b14388c 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -34,6 +34,8 @@ struct kvm_caps {
u64 supported_xcr0;
u64 supported_xss;
u64 supported_perf_cap;
+
+ u64 supported_quirks;
u64 inapplicable_quirks;
};
--
2.43.5