On 4/14/2023 2:25 PM, Chao Gao wrote: > From: Zhang Chen <chen.zhang@xxxxxxxxx> > > Currently KVM disables interception of IA32_SPEC_CTRL after a non-0 is > written to IA32_SPEC_CTRL by guest. Then, guest is allowed to write any > value to hardware. > > "virtualize IA32_SPEC_CTRL" is a new tertiary vm-exec control. This > feature allows KVM to specify that certain bits of the IA32_SPEC_CTRL > MSR cannot be modified by guest software. > > Two VMCS fields are added: > > IA32_SPEC_CTRL_MASK: bits that guest software cannot modify > IA32_SPEC_CTRL_SHADOW: value that guest software expects to be in the > IA32_SPEC_CTRL MSR > > On rdmsr, the shadow value is returned. on wrmsr, EDX:EAX is written > to the IA32_SPEC_CTRL_SHADOW and (cur_val & mask) | (EDX:EAX & ~mask) > is written to the IA32_SPEC_CTRL MSR, where > * cur_val is the original value of IA32_SPEC_CTRL MSR > * mask is the value of IA32_SPEC_CTRL_MASK > > Add a mask e.g., loaded_vmcs->spec_ctrl_mask to represent the bits guest > shouldn't change. It is 0 for now and some bits will be added by > following patches. Use per-vmcs cache to avoid unnecessary vmcs_write() > on nested transition because the mask is expected to be rarely changed > and the same for vmcs01 and vmcs02. > > To prevent guest from changing the bits in the mask, enable "virtualize > IA32_SPEC_CTRL" if supported or emulate its behavior by intercepting > the IA32_SPEC_CTRL msr. Emulating "virtualize IA32_SPEC_CTRL" behavior > is mainly to give the same capability to KVM running on potential broken > hardware or L1 guests. > > To avoid L2 evading the enforcement, enable "virtualize IA32_SPEC_CTRL" > in vmcs02. Always update the guest (shadow) value of IA32_SPEC_CTRL MSR > and the mask to preserve them across nested transitions. Note that the > shadow value may be changed because L2 may access the IA32_SPEC_CTRL > directly and the mask may be changed due to migration when L2 vCPUs are > running. > > Co-developed-by: Chao Gao <chao.gao@xxxxxxxxx> > Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx> > Signed-off-by: Zhang Chen <chen.zhang@xxxxxxxxx> > Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx> Duplicated SOB. Move the Co-developed-by down like other patches. > Tested-by: Jiaan Lu <jiaan.lu@xxxxxxxxx> > --- > arch/x86/include/asm/vmx.h | 5 ++++ > arch/x86/include/asm/vmxfeatures.h | 2 ++ > arch/x86/kvm/vmx/capabilities.h | 5 ++++ > arch/x86/kvm/vmx/nested.c | 13 ++++++++++ > arch/x86/kvm/vmx/vmcs.h | 2 ++ > arch/x86/kvm/vmx/vmx.c | 34 ++++++++++++++++++++----- > arch/x86/kvm/vmx/vmx.h | 40 +++++++++++++++++++++++++++++- > 7 files changed, 94 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h > index 498dc600bd5c..b9f88ecf20c3 100644 > --- a/arch/x86/include/asm/vmx.h > +++ b/arch/x86/include/asm/vmx.h > @@ -81,6 +81,7 @@ > * Definitions of Tertiary Processor-Based VM-Execution Controls. > */ > #define TERTIARY_EXEC_IPI_VIRT VMCS_CONTROL_BIT(IPI_VIRT) > +#define TERTIARY_EXEC_SPEC_CTRL_VIRT VMCS_CONTROL_BIT(SPEC_CTRL_VIRT) > > #define PIN_BASED_EXT_INTR_MASK VMCS_CONTROL_BIT(INTR_EXITING) > #define PIN_BASED_NMI_EXITING VMCS_CONTROL_BIT(NMI_EXITING) > @@ -233,6 +234,10 @@ enum vmcs_field { > TERTIARY_VM_EXEC_CONTROL_HIGH = 0x00002035, > PID_POINTER_TABLE = 0x00002042, > PID_POINTER_TABLE_HIGH = 0x00002043, > + IA32_SPEC_CTRL_MASK = 0x0000204A, > + IA32_SPEC_CTRL_MASK_HIGH = 0x0000204B, > + IA32_SPEC_CTRL_SHADOW = 0x0000204C, > + IA32_SPEC_CTRL_SHADOW_HIGH = 0x0000204D, > GUEST_PHYSICAL_ADDRESS = 0x00002400, > GUEST_PHYSICAL_ADDRESS_HIGH = 0x00002401, > VMCS_LINK_POINTER = 0x00002800, > diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h > index c6a7eed03914..c70d0769b7d0 100644 > --- a/arch/x86/include/asm/vmxfeatures.h > +++ b/arch/x86/include/asm/vmxfeatures.h > @@ -89,4 +89,6 @@ > > /* Tertiary Processor-Based VM-Execution Controls, word 3 */ > #define VMX_FEATURE_IPI_VIRT ( 3*32+ 4) /* Enable IPI virtualization */ > +#define VMX_FEATURE_SPEC_CTRL_VIRT ( 3*32+ 7) /* Enable IA32_SPEC_CTRL virtualization */ > + > #endif /* _ASM_X86_VMXFEATURES_H */ > diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h > index 45162c1bcd8f..a7ab70b55acf 100644 > --- a/arch/x86/kvm/vmx/capabilities.h > +++ b/arch/x86/kvm/vmx/capabilities.h > @@ -138,6 +138,11 @@ static inline bool cpu_has_tertiary_exec_ctrls(void) > CPU_BASED_ACTIVATE_TERTIARY_CONTROLS; > } > > +static __always_inline bool cpu_has_spec_ctrl_virt(void) Do we need to use __always_inline to force generating inline code? or just align with other cpu_has_xxx() functions, use inline annotation. > +{ > + return vmcs_config.cpu_based_3rd_exec_ctrl & TERTIARY_EXEC_SPEC_CTRL_VIRT; > +} > + > static inline bool cpu_has_vmx_virtualize_apic_accesses(void) > { > return vmcs_config.cpu_based_2nd_exec_ctrl &