Control-flow Enforcement Technology (CET) is a kind of CPU feature used to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks. It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP style control-flow subversion attacks. Shadow Stack (SHSTK): A shadow stack is a second stack used exclusively for control transfer operations. The shadow stack is separate from the data/normal stack and can be enabled individually in user and kernel mode. When shadow stack is enabled, CALL pushes the return address on both the data and shadow stack. RET pops the return address from both stacks and compares them. If the return addresses from the two stacks do not match, the processor generates a #CP. Indirect Branch Tracking (IBT): IBT introduces new instruction(ENDBRANCH)to mark valid target addresses of indirect branches (CALL, JMP etc...). If an indirect branch is executed and the next instruction is _not_ an ENDBRANCH, the processor generates a #CP. These instruction behaves as a NOP on platforms that doesn't support CET. Dependency: -------------------------------------------------------------------------- The first 2 patches are taken over from CET native series[1] in kernel tip. They're prerequisites for this KVM patch series as CET user mode xstate and some feature bits are defined in the patches. Add this KVM series to kernel tree to build qualified host kernel to support guest CET features. Also apply QEMU CET enabling patches[2] to build qualified QEMU. These kernel dependent patches will be enclosed in KVM series until CET native series is merged in mainline tree. Implementation: -------------------------------------------------------------------------- This series enables full support for guest CET SHSTK/IBT register states, i.e., CET register states in below usage models are backed by KVM. | User SHSTK | User IBT (user mode) -------------------------------------------------- Kernel SHSTK | Kernel IBT (kernel mode) | KVM cooperates with host kernel to back CET register states in each model. In this series, KVM manages guest CET kernel registers(MSRs) by itself and relies on host kernel to manage the user mode registers, thus KVM relies on capability from host XSS MSR before exposes CET features to guest. Note, guest supervisor(kernel) SHSTK cannot be fully supported by this series, therefore guest SSS_CET bit of CPUID(0x7,1):EDX[bit18] is cleared. Check SDM (Vol 1, Section 17.2.3) for details. CET states management: -------------------------------------------------------------------------- CET user mode states, MSR_IA32_{U_CET,PL3_SSP}, depends on {XSAVES,XRSTORS} instructions to swap guest and host's states. On vmexit, guest user states are saved to guest fpu area and host user mode states are loaded from thread context before vCPU returns to userspace, vice-versa on vmentry. See details in kvm_{load,put}_guest_fpu(). So CET user mode states management depends on CET user mode bit(U_CET bit) set in host XSS MSR. CET supervisor mode states are grouped into two categories : XSAVE-managed and non-XSAVE-managed, the former includes MSR_IA32_PL{0,1,2}_SSP and are controlled by CET supervisor mode bit(S_CET bit) in XSS, the later consists of MSR_IA32_S_CET and MSR_IA32_INTR_SSP_TBL. The XSAVE-managed supervisor states theoretically can be handled by enabling S_CET bit in host XSS. But given the fact supervisor shadow stack isn't enabled in Linux kernel, enabling the control bit just like that for user mode states has global side-effects to all threads/tasks running on host, i.e.: 1) Introducing unnecessary XSAVE operation when switch to non-vCPU userspace within current FPU framework. 2)Forcing allocating additional space for CET supervisor states in each thread context regardless whether it's vCPU thread or not. To avoid these downsides, this series provides a KVM solution to save/reload vCPU's supervisor SHSTK states. VMX introduces new VMCS fields, {GUEST|HOST}_{S_CET,SSP,INTR_SSP_TABL}, to facilitate guest/host non-XSAVES-managed states. When VMX CET entry/exit load bits are set, guest/host MSR_IA32_{S_CET,INTR_SSP_TBL,SSP} are loaded from equivalent fields at vm-exit/entry. With these new fields, such supervisor states require no addtional KVM save/reload actions. Tests: -------------------------------------------------------------------------- This series passed basic CET user shadow stack test and kernel IBT test in L1 and L2 guest. The patch series _has_ impact to existing vmx test cases in KVM-unit-tests,the failures have been fixed in this patch[3]. All other parts of KVM unit-tests and selftests passed with this series. One new selftest app for CET MSRs is posted here[4]. To run user SHSTK test and kernel IBT test in guest , an CET capable platform is required, e.g., Sapphire Rapids server, and follow below steps to build host/guest kernel properly: 1. Build host kernel: Add this series to kernel tree and build kernel. 2. Build guest kernel: Add full CET _native_ series to kernel tree and opt-in CONFIG_X86_KERNEL_IBT and CONFIG_X86_USER_SHADOW_STACK options. Build with CET enabled gcc versions(>= 8.5.0). 3. Use patched QEMU to launch a guest. Check kernel selftest test_shadow_stack_64 output: [INFO] new_ssp = 7f8c82100ff8, *new_ssp = 7f8c82101001 [INFO] changing ssp from 7f8c82900ff0 to 7f8c82100ff8 [INFO] ssp is now 7f8c82101000 [OK] Shadow stack pivot [OK] Shadow stack faults [INFO] Corrupting shadow stack [INFO] Generated shadow stack violation successfully [OK] Shadow stack violation test [INFO] Gup read -> shstk access success [INFO] Gup write -> shstk access success [INFO] Violation from normal write [INFO] Gup read -> write access success [INFO] Violation from normal write [INFO] Gup write -> write access success [INFO] Cow gup write -> write access success [OK] Shadow gup test [INFO] Violation from shstk access [OK] mprotect() test [SKIP] Userfaultfd unavailable. [OK] 32 bit test Check kernel IBT with dmesg | grep CET: CET detected: Indirect Branch Tracking enabled -------------------------------------------------------------------------- Changes in v4: 1. Overhauled v3 series[5] per community review feedback. [Sean, Chao, Binbin etc.] 2. Modified CET dependency checks on host side before expose the features. 3. Added KVM specific solution for save/reload guest CET SHSTK supervisor states. 4. Rebase on kvm-x86/next [6]. [1]: CET native series: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/shstk [2]: QEMU patch: https://lore.kernel.org/all/20230720111445.99509-1-weijiang.yang@xxxxxxxxx/ [3]: KVM-unit-tests fixup: https://lore.kernel.org/all/20230720115810.104890-1-weijiang.yang@xxxxxxxxx/ [4]: Selftests for CET MSRs: https://lore.kernel.org/all/20230720120401.105770-1-weijiang.yang@xxxxxxxxx/ [5]: v3 patchset: https://lore.kernel.org/all/20230511040857.6094-1-weijiang.yang@xxxxxxxxx/ [6]: Rebase branch: https://github.com/kvm-x86/linux/releases/tag/kvm-x86-next-2023.06.22 Patch 1-2: Native dependent CET patches. Patch 3-5: Enable XSS support in KVM. Patch 6 : Prepare patch for XSAVE-managed MSR access Patch 7-11: Common patches to support CET on x86. Patch 12-18: VMX specific patches to support CET. Patch 19-20: nVMX patches for CET support in nested VM. ----------------------------------------------------------------------------- Rick Edgecombe (2): x86/cpufeatures: Add CPU feature flags for shadow stacks x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Sean Christopherson (2): KVM:x86: Report XSS as to-be-saved if there are supported features KVM:x86: Load guest FPU state when access XSAVE-managed MSRs Yang Weijiang (16): KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS KVM:x86: Initialize kvm_caps.supported_xss KVM:x86: Add fault checks for guest CR4.CET setting KVM:x86: Report KVM supported CET MSRs as to-be-saved KVM:x86: Add common code of CET MSR access KVM:x86: Make guest supervisor states as non-XSAVE managed KVM:x86: Save and reload GUEST_SSP to/from SMRAM KVM:VMX: Introduce CET VMCS fields and control bits KVM:VMX: Emulate read and write to CET MSRs KVM:VMX: Set up interception for CET MSRs KVM:VMX: Save host MSR_IA32_S_CET to VMCS field KVM:x86: Optimize CET supervisor SSP save/reload KVM:x86: Enable CET virtualization for VMX and advertise to userspace KVM:x86: Enable guest CET supervisor xstate bit support KVM:nVMX: Refine error code injection to nested VM KVM:nVMX: Enable CET support for nested VM arch/arm64/include/asm/kvm_host.h | 1 + arch/mips/include/asm/kvm_host.h | 1 + arch/powerpc/include/asm/kvm_host.h | 1 + arch/riscv/include/asm/kvm_host.h | 1 + arch/s390/include/asm/kvm_host.h | 1 + arch/x86/include/asm/cpufeatures.h | 2 + arch/x86/include/asm/disabled-features.h | 8 +- arch/x86/include/asm/fpu/types.h | 16 +- arch/x86/include/asm/fpu/xstate.h | 6 +- arch/x86/include/asm/kvm_host.h | 6 +- arch/x86/include/asm/msr-index.h | 1 + arch/x86/include/asm/vmx.h | 8 + arch/x86/include/uapi/asm/kvm_para.h | 1 + arch/x86/kernel/cpu/cpuid-deps.c | 1 + arch/x86/kernel/fpu/xstate.c | 90 +++++---- arch/x86/kvm/cpuid.c | 32 +++- arch/x86/kvm/cpuid.h | 10 + arch/x86/kvm/smm.c | 17 ++ arch/x86/kvm/smm.h | 2 +- arch/x86/kvm/vmx/capabilities.h | 10 + arch/x86/kvm/vmx/nested.c | 49 ++++- arch/x86/kvm/vmx/nested.h | 7 + arch/x86/kvm/vmx/vmcs12.c | 6 + arch/x86/kvm/vmx/vmcs12.h | 14 +- arch/x86/kvm/vmx/vmx.c | 134 ++++++++++++- arch/x86/kvm/vmx/vmx.h | 6 +- arch/x86/kvm/x86.c | 228 +++++++++++++++++++++-- arch/x86/kvm/x86.h | 31 +++ include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 1 + 30 files changed, 606 insertions(+), 86 deletions(-) base-commit: 88bb466c9dec4f70d682cf38c685324e7b1b3d60 -- 2.27.0