> -----Original Message----- > From: Jim Mattson <jmattson@xxxxxxxxxx> > Sent: Friday, June 12, 2020 7:04 PM > To: Moger, Babu <Babu.Moger@xxxxxxx> > Cc: Wanpeng Li <wanpengli@xxxxxxxxxxx>; Joerg Roedel <joro@xxxxxxxxxx>; > the arch/x86 maintainers <x86@xxxxxxxxxx>; Sean Christopherson > <sean.j.christopherson@xxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>; > Borislav Petkov <bp@xxxxxxxxx>; H . Peter Anvin <hpa@xxxxxxxxx>; Paolo > Bonzini <pbonzini@xxxxxxxxxx>; Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>; > Thomas Gleixner <tglx@xxxxxxxxxxxxx>; LKML <linux-kernel@xxxxxxxxxxxxxxx>; > kvm list <kvm@xxxxxxxxxxxxxxx> > Subject: Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD > > On Fri, Jun 12, 2020 at 2:47 PM Babu Moger <babu.moger@xxxxxxx> wrote: > > > > > > > > On 6/12/20 3:10 PM, Jim Mattson wrote: > > > On Fri, Jun 12, 2020 at 12:35 PM Babu Moger <babu.moger@xxxxxxx> > wrote: > > >> > > >> > > >> > > >>> -----Original Message----- > > >>> From: Jim Mattson <jmattson@xxxxxxxxxx> > > >>> Sent: Thursday, June 11, 2020 6:51 PM > > >>> To: Moger, Babu <Babu.Moger@xxxxxxx> > > >>> Cc: Wanpeng Li <wanpengli@xxxxxxxxxxx>; Joerg Roedel > <joro@xxxxxxxxxx>; > > >>> the arch/x86 maintainers <x86@xxxxxxxxxx>; Sean Christopherson > > >>> <sean.j.christopherson@xxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>; > > >>> Borislav Petkov <bp@xxxxxxxxx>; H . Peter Anvin <hpa@xxxxxxxxx>; Paolo > > >>> Bonzini <pbonzini@xxxxxxxxxx>; Vitaly Kuznetsov > <vkuznets@xxxxxxxxxx>; > > >>> Thomas Gleixner <tglx@xxxxxxxxxxxxx>; LKML <linux- > kernel@xxxxxxxxxxxxxxx>; > > >>> kvm list <kvm@xxxxxxxxxxxxxxx> > > >>> Subject: Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD > > >>> > > >>> On Thu, Jun 11, 2020 at 2:48 PM Babu Moger <babu.moger@xxxxxxx> > wrote: > > >>>> > > >>>> The following intercept is added for INVPCID instruction: > > >>>> Code Name Cause > > >>>> A2h VMEXIT_INVPCID INVPCID instruction > > >>>> > > >>>> The following bit is added to the VMCB layout control area > > >>>> to control intercept of INVPCID: > > >>>> Byte Offset Bit(s) Function > > >>>> 14h 2 intercept INVPCID > > >>>> > > >>>> For the guests with nested page table (NPT) support, the INVPCID > > >>>> feature works as running it natively. KVM does not need to do any > > >>>> special handling in this case. > > >>>> > > >>>> Interceptions are required in the following cases. > > >>>> 1. If the guest tries to disable the feature when the underlying > > >>>> hardware supports it. In this case hypervisor needs to report #UD. > > >>> > > >>> Per the AMD documentation, attempts to use INVPCID at CPL>0 will > > >>> result in a #GP, regardless of the intercept bit. If the guest CPUID > > >>> doesn't enumerate the feature, shouldn't the instruction raise #UD > > >>> regardless of CPL? This seems to imply that we should intercept #GP > > >>> and decode the instruction to see if we should synthesize #UD instead. > > >> > > >> Purpose here is to report UD when the guest CPUID doesn't enumerate the > > >> INVPCID feature When Bare-metal supports it. It seems to work fine for > > >> that purpose. You are right. The #GP for CPL>0 takes precedence over > > >> interception. No. I am not planning to intercept GP. > > > > > > WIthout intercepting #GP, you fail to achieve your stated purpose. > > > > I think I have misunderstood this part. I was not inteding to change the > > #GP behaviour. I will remove this part. My intension of these series is to > > handle invpcid in shadow page mode. I have verified that part. Hope I did > > not miss anything else. > > You don't really have to intercept INVPCID when tdp is in use, right? That is correct. Adding the intercept only when tdp is off. > There are certainly plenty of operations for which kvm does not > properly raise #UD when they aren't enumerated in the guest CPUID. > > > >> I will change the text. How about this? > > >> > > >> Interceptions are required in the following cases. > > >> 1. If the guest CPUID doesn't enumerate the INVPCID feature when the > > >> underlying hardware supports it, hypervisor needs to report UD. However, > > >> #GP for CPL>0 takes precedence over interception. > > > > > > This text is not internally consistent. In one sentence, you say that > > > "hypervisor needs to report #UD." In the next sentence, you are > > > essentially saying that the hypervisor doesn't need to report #UD. > > > Which is it? > > > > > >>>> 2. When the guest is running with shadow page table enabled, in > > >>>> this case the hypervisor needs to handle the tlbflush based on the > > >>>> type of invpcid instruction type. > > >>>> > > >>>> AMD documentation for INVPCID feature is available at "AMD64 > > >>>> Architecture Programmer’s Manual Volume 2: System Programming, > > >>>> Pub. 24593 Rev. 3.34(or later)" > > >>>> > > >>>> The documentation can be obtained at the links below: > > >>>> Link: > > >>> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.a > > >>> > md.com%2Fsystem%2Ffiles%2FTechDocs%2F24593.pdf&data=02%7C01%7 > > >>> > Cbabu.moger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8 > > >>> > 961fe4884e608e11a82d994e183d%7C0%7C0%7C637275163374103811&s > > >>> > data=E%2Fdb6T%2BdO4nrtUoqhKidF6XyorsWrphj6O4WwNZpmYA%3D&res > > >>> erved=0 > > >>>> Link: > > >>> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla. > > >>> > kernel.org%2Fshow_bug.cgi%3Fid%3D206537&data=02%7C01%7Cbabu.m > > >>> > oger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8961fe488 > > >>> > 4e608e11a82d994e183d%7C0%7C0%7C637275163374103811&sdata=b81 > > >>> > 9W%2FhKS93%2BAp3QvcsR0BwTQpUVUFMbIaNaisgWHRY%3D&reserved= > > >>> 0 > > >>>> > > >>>> Signed-off-by: Babu Moger <babu.moger@xxxxxxx> > > >>>> --- > > >>>> arch/x86/include/asm/svm.h | 4 ++++ > > >>>> arch/x86/include/uapi/asm/svm.h | 2 ++ > > >>>> arch/x86/kvm/svm/svm.c | 42 > > >>> +++++++++++++++++++++++++++++++++++++++ > > >>>> 3 files changed, 48 insertions(+) > > >>>> > > >>>> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h > > >>>> index 62649fba8908..6488094f67fa 100644 > > >>>> --- a/arch/x86/include/asm/svm.h > > >>>> +++ b/arch/x86/include/asm/svm.h > > >>>> @@ -55,6 +55,10 @@ enum { > > >>>> INTERCEPT_RDPRU, > > >>>> }; > > >>>> > > >>>> +/* Extended Intercept bits */ > > >>>> +enum { > > >>>> + INTERCEPT_INVPCID = 2, > > >>>> +}; > > >>>> > > >>>> struct __attribute__ ((__packed__)) vmcb_control_area { > > >>>> u32 intercept_cr; > > >>>> diff --git a/arch/x86/include/uapi/asm/svm.h > > >>> b/arch/x86/include/uapi/asm/svm.h > > >>>> index 2e8a30f06c74..522d42dfc28c 100644 > > >>>> --- a/arch/x86/include/uapi/asm/svm.h > > >>>> +++ b/arch/x86/include/uapi/asm/svm.h > > >>>> @@ -76,6 +76,7 @@ > > >>>> #define SVM_EXIT_MWAIT_COND 0x08c > > >>>> #define SVM_EXIT_XSETBV 0x08d > > >>>> #define SVM_EXIT_RDPRU 0x08e > > >>>> +#define SVM_EXIT_INVPCID 0x0a2 > > >>>> #define SVM_EXIT_NPF 0x400 > > >>>> #define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401 > > >>>> #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402 > > >>>> @@ -171,6 +172,7 @@ > > >>>> { SVM_EXIT_MONITOR, "monitor" }, \ > > >>>> { SVM_EXIT_MWAIT, "mwait" }, \ > > >>>> { SVM_EXIT_XSETBV, "xsetbv" }, \ > > >>>> + { SVM_EXIT_INVPCID, "invpcid" }, \ > > >>>> { SVM_EXIT_NPF, "npf" }, \ > > >>>> { SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \ > > >>>> { SVM_EXIT_AVIC_UNACCELERATED_ACCESS, > > >>> "avic_unaccelerated_access" }, \ > > >>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > >>>> index 285e5e1ff518..82d974338f68 100644 > > >>>> --- a/arch/x86/kvm/svm/svm.c > > >>>> +++ b/arch/x86/kvm/svm/svm.c > > >>>> @@ -813,6 +813,11 @@ static __init void svm_set_cpu_caps(void) > > >>>> if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) || > > >>>> boot_cpu_has(X86_FEATURE_AMD_SSBD)) > > >>>> kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD); > > >>>> + > > >>>> + /* Enable INVPCID if both PCID and INVPCID enabled */ > > >>>> + if (boot_cpu_has(X86_FEATURE_PCID) && > > >>>> + boot_cpu_has(X86_FEATURE_INVPCID)) > > >>>> + kvm_cpu_cap_set(X86_FEATURE_INVPCID); > > >>>> } > > >>>> > > >>>> static __init int svm_hardware_setup(void) > > >>>> @@ -1099,6 +1104,17 @@ static void init_vmcb(struct vcpu_svm *svm) > > >>>> clr_intercept(svm, INTERCEPT_PAUSE); > > >>>> } > > >>>> > > >>>> + /* > > >>>> + * Intercept INVPCID instruction only if shadow page table is > > >>>> + * enabled. Interception is not required with nested page table. > > >>>> + */ > > >>>> + if (boot_cpu_has(X86_FEATURE_INVPCID)) { > > >>>> + if (!npt_enabled) > > >>>> + set_extended_intercept(svm, INTERCEPT_INVPCID); > > >>>> + else > > >>>> + clr_extended_intercept(svm, INTERCEPT_INVPCID); > > >>>> + } > > >>>> + > > >>>> if (kvm_vcpu_apicv_active(&svm->vcpu)) > > >>>> avic_init_vmcb(svm); > > >>>> > > >>>> @@ -2715,6 +2731,23 @@ static int mwait_interception(struct > vcpu_svm > > >>> *svm) > > >>>> return nop_interception(svm); > > >>>> } > > >>>> > > >>>> +static int invpcid_interception(struct vcpu_svm *svm) > > >>>> +{ > > >>>> + struct kvm_vcpu *vcpu = &svm->vcpu; > > >>>> + unsigned long type; > > >>>> + gva_t gva; > > >>>> + > > >>>> + /* > > >>>> + * For an INVPCID intercept: > > >>>> + * EXITINFO1 provides the linear address of the memory operand. > > >>>> + * EXITINFO2 provides the contents of the register operand. > > >>>> + */ > > >>>> + type = svm->vmcb->control.exit_info_2; > > >>>> + gva = svm->vmcb->control.exit_info_1; > > >>>> + > > >>>> + return kvm_handle_invpcid_types(vcpu, gva, type); > > >>>> +} > > >>>> + > > >>>> static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = { > > >>>> [SVM_EXIT_READ_CR0] = cr_interception, > > >>>> [SVM_EXIT_READ_CR3] = cr_interception, > > >>>> @@ -2777,6 +2810,7 @@ static int (*const svm_exit_handlers[])(struct > > >>> vcpu_svm *svm) = { > > >>>> [SVM_EXIT_MWAIT] = mwait_interception, > > >>>> [SVM_EXIT_XSETBV] = xsetbv_interception, > > >>>> [SVM_EXIT_RDPRU] = rdpru_interception, > > >>>> + [SVM_EXIT_INVPCID] = invpcid_interception, > > >>>> [SVM_EXIT_NPF] = npf_interception, > > >>>> [SVM_EXIT_RSM] = rsm_interception, > > >>>> [SVM_EXIT_AVIC_INCOMPLETE_IPI] = > > >>> avic_incomplete_ipi_interception, > > >>>> @@ -3562,6 +3596,14 @@ static void svm_cpuid_update(struct > kvm_vcpu > > >>> *vcpu) > > >>>> svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) && > > >>>> guest_cpuid_has(&svm->vcpu, X86_FEATURE_NRIPS); > > >>>> > > >>>> + /* > > >>>> + * Intercept INVPCID instruction if the baremetal has the support > > >>>> + * but the guest doesn't claim the feature. > > >>>> + */ > > >>>> + if (boot_cpu_has(X86_FEATURE_INVPCID) && > > >>>> + !guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) > > >>>> + set_extended_intercept(svm, INTERCEPT_INVPCID); > > >>>> + > > >>> > > >>> What if INVPCID is enabled in the guest CPUID later? Shouldn't we then > > >>> clear this intercept bit? > > >> > > >> I assume the feature enable comes in the same code path as this. I can add > > >> "if else" check here if that is what you are suggesting. > > > > > > Yes, that's what I'm suggesting. > > > > > >>> > > >>>> if (!kvm_vcpu_apicv_active(vcpu)) > > >>>> return; > > >>>> > > >>>>