[RFC PATCH 00/67] KVM: X86: TDX support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>

* What's TDX?
TDX stands for Trust Domain Extensions which isolates VMs from
the virtual-machine manager (VMM)/hypervisor and any other software on
the platform. [1]
For details, the specifications, [2], [3], [4], [5], [6], [7], are
available.


* The goal of this RFC patch
The purpose of this post is to get feedback early on high level design
issue of KVM enhancement for TDX. The detailed coding (variable naming
etc) is not cared of. This patch series is incomplete (not working).
Although multiple software components, not only KVM but also QEMU,
guest Linux and virtual bios, need to be updated, this includes only
KVM VMM part. For those who are curious to changes to other
component, there are public repositories at github. [8], [9]


* Terminology
Here are short explanations of key concepts.
For detailed explanation or other terminologies, please refer to the
specifications. [2], [3], [4], [5], [6], [7].
- Trusted Domain(TD)
  Hardware-isolated virtual machines managed by TDX-module.
- Secure-Arbitration Mode(SEAM)
  A new mode of the CPU. It consists of SEAM Root and SEAM Non-Root
  which corresponds to VMX Root and VMX Non-Root.
- TDX-module
  TDX-module runs in SEAM Root that manages TD guest state.
  It provides ABI for VMM to manages TDs. It's expensive operation.
- SEAM loader(SEAMLDR)
  Authenticated Code Module(ACM) to load the TDX-module.
- Secure EPT (S-EPT)
  An extended Page table that is encrypted.
  Shared bit(bit 51 or 47) in GPA selects shared vs private.
  0: private to TD, 1: shared with host VMM.


* Major touch/discussion points
The followings are the major touch points where feedback is wanted.

** the file location of the boot code
BSP launches SEAM Loader on BSP to load TDX module. TDX module is on
all CPUs. The directory, arch/x86/kvm/boot/seam, is chosen to locate
the related files in near directory. When maintenance/enhancement in
future, it will be easy to identify that they're related to be synced
with.

- arch/x86/kvm/boot/seam: the current choice
  Pros:
  - The directory clearly indicates that the code is related to only
    KVM.
  - Keep files near to the related code (KVM TDX code).
  Cons:
  - It doesn't follow the existing convention.

Alternative:
The alternative is to follow the existing convention.
- arch/x86/kernel/cpu/
  Pros:
  - It follows the existing convention.
  Cons:
  - It's unclear that it's related to only KVM TDX.

- drivers/firmware/
  As TDX module can be considered a firmware, yet other choice is
  Pros:
  - It follows the existing convention. it clarifies that TDX module
    is a firmware.
  Cons:
  - It's hard to understand the firmware is only for KVM TDX.
  - The files are far from the related code(KVM TDX).

** Coexistence of normal(VMX) VM and TD VM
It's required to allow both legacy(normal VMX) VMs and new TD VMs to
coexist. Otherwise the benefits of VM flexibility would be eliminated.
The main issue for it is that the logic of kvm_x86_ops callbacks for
TDX is different from VMX. On the other hand, the variable,
kvm_x86_ops, is global single variable. Not per-VM, not per-vcpu.

Several points to be considered.
  . No or minimal overhead when TDX is disabled(CONFIG_KVM_INTEL_TDX=n).
  . Avoid overhead of indirect call via function pointers.
  . Contain the changes under arch/x86/kvm/vmx directory and share logic
    with VMX for maintenance.
    Even though the ways to operation on VM (VMX instruction vs TDX
    SEAM call) is different, the basic idea remains same. So, many
    logic can be shared.
  . Future maintenance
    The huge change of kvm_x86_ops in (near) future isn't expected.
    a centralized file is acceptable.

- Wrapping kvm x86_ops: The current choice
  Introduce dedicated file for arch/x86/kvm/vmx/main.c (the name,
  main.c, is just chosen to show main entry points for callbacks.) and
  wrapper functions around all the callbacks with
  "if (is-tdx) tdx-callback() else vmx-callback()".

  Pros:
  - No major change in common x86 KVM code. The change is (mostly)
    contained under arch/x86/kvm/vmx/.
  - When TDX is disabled(CONFIG_KVM_INTEL_TDX=n), the overhead is
    optimized out.
  - Micro optimization by avoiding function pointer.
  Cons:
  - Many boiler plates in arch/x86/kvm/vmx/main.c.

Alternative:
- Introduce another callback layer under arch/x86/kvm/vmx.
  Pros:
  - No major change in common x86 KVM code. The change is (mostly)
    contained under arch/x86/kvm/vmx/.
  - clear separation on callbacks.
  Cons:
  - overhead in VMX even when TDX is disabled(CONFIG_KVM_INTEL_TDX=n).

- Allow per-VM kvm_x86_ops callbacks instead of global kvm_x86_ops
  Pros:
  - clear separation on callbacks.
  Cons:
  - Big change in common x86 code.
  - overhead in common code even when TDX is
    disabled(CONFIG_KVM_INTEL_TDX=n).

- Introduce new directory arch/x86/kvm/tdx
  Pros:
  - It clarifies that TDX is different from VMX.
  Cons:
  - Given the level of code sharing, it complicates code sharing.

** KVM MMU Changes
KVM MMU needs to be enhanced to handle Secure/Shared-EPT. The
high-level execution flow is mostly same to normal EPT case.
EPT violation/misconfiguration -> invoke TDP fault handler ->
resolve TDP fault -> resume execution. (or emulate MMIO)
The difference is, that S-EPT is operated(read/write) via TDX SEAM
call which is expensive instead of direct read/write EPT entry.
One bit of GPA (51 or 47 bit) is repurposed so that it means shared
with host(if set to 1) or private to TD(if cleared to 0).

- The current implementation
  . Reuse the existing MMU code with minimal update.  Because the
    execution flow is mostly same. But additional operation, TDX call
    for S-EPT, is needed. So add hooks for it to kvm_x86_ops.
  . For performance, minimize TDX SEAM call to operate on S-EPT. When
    getting corresponding S-EPT pages/entry from faulting GPA, don't
    use TDX SEAM call to read S-EPT entry. Instead create shadow copy
    in host memory.
    Repurpose the existing kvm_mmu_page as shadow copy of S-EPT and
    associate S-EPT to it.
  . Treats share bit as attributes. mask/unmask the bit where
    necessary to keep the existing traversing code works.
    Introduce kvm.arch.gfn_shared_mask and use "if (gfn_share_mask)"
    for special case.
    = 0 : for non-TDX case
    = 51 or 47 bit set for TDX case.

  Pros:
  - Large code reuse with minimal new hooks.
  - Execution path is same.
  Cons:
  - Complicates the existing code.
  - Repurpose kvm_mmu_page as shadow of Secure-EPT can be confusing.

Alternative:
- Replace direct read/write on EPT entry with TDX-SEAM call by
  introducing callbacks on EPT entry.
  Pros:
  - Straightforward.
  Cons:
  - Too many touching point.
  - Too slow due to TDX-SEAM call.
  - Overhead even when TDX is disabled(CONFIG_KVM_INTEL_TDX=n).

- Sprinkle "if (is-tdx)" for TDX special case
  Pros:
  - Straightforward.
  Cons:
  - The result is non-generic and ugly.
  - Put TDX specific logic into common KVM MMU code.

** New KVM API, ioctl (sub)command, to manage TD VMs
Additional KVM API are needed to control TD VMs. The operations on TD
VMs are specific to TDX.

- Piggyback and repurpose KVM_MEMORY_ENCRYPT_OP
  Although not all operation isn't memory encryption, repupose to get
  TDX specific ioctls.
  Pros:
  - No major change in common x86 KVM code.
  Cons:
  - The operations aren't actually memory encryption, but operations
    on TD VMs.

Alternative:
- Introduce new ioctl for guest protection like
  KVM_GUEST_PROTECTION_OP and introduce subcommand for TDX.
  Pros:
  - Clean name.
  Cons:
  - One more new ioctl for guest protection.
  - Confusion with KVM_MEMORY_ENCRYPT_OP with KVM_GUEST_PROTECTION_OP.

- Rename KVM_MEMORY_ENCRYPT_OP to KVM_GUEST_PROTECTION_OP and keep
  KVM_MEMORY_ENCRYPT_OP as same value for user API for compatibility.
  "#define KVM_MEMORY_ENCRYPT_OP KVM_GUEST_PROTECTION_OP" for uapi
  compatibility.
  Pros:
  - No new ioctl with more suitable name.
  Cons:
  - May cause confusion to the existing user program.


* Items unsupported/out of the scope
Those items are unsupported at the moment or out of the scope.
- Large page(2MB, 1GB) support
- Page migration
- Debugger support(qemu gdb stub)
- Removing user space(qemu) mapping of guest private memory
  Because this topic itself is big and will take time, the effort is
  taking place independently. [12]
- Attestation
  The end-to-end integration is required.
- Live migration
  TDX 1.0 doesn't support this.
- Nested virtualization
  TDX 1.0 doesn't support this.


* Related repositories
TDX enabling software are composed of several components. Not only
KVM/x86 enablement, but also other components. There are several
publicly available repositories in github. Those are not complete, not
working, but only for reference for those who are curious.
- TDX host/guest [8]
- TDX Virtual Firmware [9]
- qemu change isn't published (yet).


* Related presentations
At KVM forum 2020, several presentation related to TDX were given. [10] [11]
They are helpful to understand TDX and KVM/qemu related changes.


* Patch organization
The main changes are only 2 patches(62 and 64).
The preceding patches(01-61) are refactoring the code and introducing
additional hooks. The patch 64 plugs hooks into TDX implementation.

- patch 01-16: They are preparations. introduce architecture
               constants, code refactoring, export symbols for
               following patches.
- patch 17-33: start to introduce the new type of VM and allow the
               coexistence of multiple type of VM. allow/disallow KVM
               ioctl where appropriate. Especially make per-system
               ioctl to per-VM ioctl.
- patch 34-43: refactoring KVM MMU and adding new hooks for Secure
               EPT.
- patch 44-48: refactoring KVM/VMX code + wrapper for kvm_x86_ops for
               VMX and TDX.
- patch 52-61: introducing TDX architectural constants/structures and
               helper functions.
- patch 62-63: load/init TDX module during boot.
- patch 64-65: main patch to add "basic" support for building/running
               TDX.
- patch 66   : This patch is not for review, but to make build success.


[1] TDX specification
   https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html
[2] Intel Trust Domain Extensions (Intel TDX)
   https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-whitepaper-final9-17.pdf
[3] Intel CPU Architectural Extensions Specification
   https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-cpu-architectural-specification.pdf
[4] Intel TDX Module 1.0 EAS
   https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-module-1eas.pdf
[5] Intel TDX Loader Interface Specification
   https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-seamldr-interface-specification.pdf
[6] Intel TDX Guest-Hypervisor Communication Interface
   https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-guest-hypervisor-communication-interface.pdf
[7] Intel TDX Virtual Firmware Design Guide
   https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.
[8] intel public github
   kvm TDX branch: https://github.com/intel/tdx/tree/kvm
   TDX guest branch: https://github.com/intel/tdx/tree/guest
[9] tdvf
    https://github.com/tianocore/edk2-staging/tree/TDVF
[10] KVM forum 2020: Intel Virtualization Technology Extensions to
     Enable Hardware Isolated VMs
     https://osseu2020.sched.com/event/eDzm/intel-virtualization-technology-extensions-to-enable-hardware-isolated-vms-sean-christopherson-intel
[11] Linux Security Summit EU 2020:
     Architectural Extensions for Hardware Virtual Machine Isolation
     to Advance Confidential Computing in Public Clouds - Ravi Sahita
     & Jun Nakajima, Intel Corporation
     https://osseu2020.sched.com/event/eDOx/architectural-extensions-for-hardware-virtual-machine-isolation-to-advance-confidential-computing-in-public-clouds-ravi-sahita-jun-nakajima-intel-corporation
[12] [RFCv2,00/16] KVM protected memory extension
     https://lkml.org/lkml/2020/10/20/66


Isaku Yamahata (4):
  KVM: x86: Make KVM_CAP_X86_SMM a per-VM capability
  KVM: Add per-VM flag to mark read-only memory as unsupported
  fixup! KVM: TDX: Add "basic" support for building and running Trust
    Domains
  KVM: X86: not for review: add dummy file for TDX-SEAM module

Kai Huang (3):
  KVM: x86: Add per-VM flag to disable in-kernel I/O APIC and level
    routes
  KVM: TDX: Add SEAMRR related MSRs macro definition
  cpu/hotplug: Document that TDX also depends on booting CPUs once

Rick Edgecombe (1):
  KVM: x86: Add infrastructure for stolen GPA bits

Sean Christopherson (58):
  x86/cpufeatures: Add synthetic feature flag for TDX (in host)
  x86/msr-index: Define MSR_IA32_MKTME_KEYID_PART used by TDX
  KVM: Export kvm_io_bus_read for use by TDX for PV MMIO
  KVM: Enable hardware before doing arch VM initialization
  KVM: x86: Split core of hypercall emulation to helper function
  KVM: x86: Export kvm_mmio tracepoint for use by TDX for PV MMIO
  KVM: x86/mmu: Zap only leaf SPTEs for deleted/moved memslot by default
  KVM: Add infrastructure and macro to mark VM as bugged
  KVM: Export kvm_make_all_cpus_request() for use in marking VMs as
    bugged
  KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs that are fatal to the
    VM
  KVM: x86/mmu: Mark VM as bugged if page fault returns RET_PF_INVALID
  KVM: VMX: Explicitly check for hv_remote_flush_tlb when loading pgd()
  KVM: Add max_vcpus field in common 'struct kvm'
  KVM: x86: Add vm_type to differentiate legacy VMs from protected VMs
  KVM: x86: Hoist kvm_dirty_regs check out of sync_regs()
  KVM: x86: Introduce "protected guest" concept and block disallowed
    ioctls
  KVM: x86: Add per-VM flag to disable direct IRQ injection
  KVM: x86: Add flag to disallow #MC injection / KVM_X86_SETUP_MCE
  KVM: x86: Add flag to mark TSC as immutable (for TDX)
  KVM: Add per-VM flag to disable dirty logging of memslots for TDs
  KVM: x86: Allow host-initiated WRMSR to set X2APIC regardless of CPUID
  KVM: x86: Add kvm_x86_ops .cache_gprs() and .flush_gprs()
  KVM: x86: Add support for vCPU and device-scoped KVM_MEMORY_ENCRYPT_OP
  KVM: x86: Introduce vm_teardown() hook in kvm_arch_vm_destroy()
  KVM: x86: Add a switch_db_regs flag to handle TDX's auto-switched
    behavior
  KVM: x86: Check for pending APICv interrupt in kvm_vcpu_has_events()
  KVM: x86: Add option to force LAPIC expiration wait
  KVM: x86: Add guest_supported_xss placholder
  KVM: Export kvm_is_reserved_pfn() for use by TDX
  KVM: x86/mmu: Explicitly check for MMIO spte in fast page fault
  KVM: x86/mmu: Track shadow MMIO value on a per-VM basis
  KVM: x86/mmu: Ignore bits 63 and 62 when checking for "present" SPTEs
  KVM: x86/mmu: Allow non-zero init value for shadow PTE
  KVM: x86/mmu: Refactor shadow walk in __direct_map() to reduce
    indentation
  KVM: x86/mmu: Return old SPTE from mmu_spte_clear_track_bits()
  KVM: x86/mmu: Frame in support for private/inaccessible shadow pages
  KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault()
  KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX
  KVM: VMX: Modify NMI and INTR handlers to take intr_info as param
  KVM: VMX: Move NMI/exception handler to common helper
  KVM: VMX: Split out guts of EPT violation to common/exposed function
  KVM: VMX: Define EPT Violation architectural bits
  KVM: VMX: Define VMCS encodings for shared EPT pointer
  KVM: VMX: Add 'main.c' to wrap VMX and TDX
  KVM: VMX: Move setting of EPT MMU masks to common VT-x code
  KVM: VMX: Move register caching logic to common code
  KVM: TDX: Add TDX "architectural" error codes
  KVM: TDX: Add architectural definitions for structures and values
  KVM: TDX: Define TDCALL exit reason
  KVM: TDX: Add macro framework to wrap TDX SEAMCALLs
  KVM: TDX: Stub in tdx.h with structs, accessors, and VMCS helpers
  KVM: VMX: Add macro framework to read/write VMCS for VMs and TDs
  KVM: VMX: Move AR_BYTES encoder/decoder helpers to common.h
  KVM: VMX: MOVE GDT and IDT accessors to common code
  KVM: VMX: Move .get_interrupt_shadow() implementation to common VMX
    code
  KVM: TDX: Load and init TDX-SEAM module during boot
  KVM: TDX: Add "basic" support for building and running Trust Domains
  KVM: x86: Mark the VM (TD) as bugged if non-coherent DMA is detected

Zhang Chen (1):
  x86/cpu: Move get_builtin_firmware() common code (from microcode only)

 arch/arm64/include/asm/kvm_host.h     |    3 -
 arch/arm64/kvm/arm.c                  |    7 +-
 arch/arm64/kvm/vgic/vgic-init.c       |    6 +-
 arch/x86/Kbuild                       |    1 +
 arch/x86/include/asm/cpu.h            |    5 +
 arch/x86/include/asm/cpufeatures.h    |    1 +
 arch/x86/include/asm/kvm_boot.h       |   43 +
 arch/x86/include/asm/kvm_host.h       |   52 +-
 arch/x86/include/asm/microcode.h      |    3 -
 arch/x86/include/asm/msr-index.h      |   10 +
 arch/x86/include/asm/vmx.h            |    6 +
 arch/x86/include/asm/vmxfeatures.h    |    2 +-
 arch/x86/include/uapi/asm/kvm.h       |   55 +
 arch/x86/include/uapi/asm/vmx.h       |    4 +-
 arch/x86/kernel/cpu/common.c          |   20 +
 arch/x86/kernel/cpu/intel.c           |    4 +
 arch/x86/kernel/cpu/microcode/core.c  |   18 -
 arch/x86/kernel/cpu/microcode/intel.c |    1 +
 arch/x86/kernel/setup.c               |    3 +
 arch/x86/kvm/Kconfig                  |    8 +
 arch/x86/kvm/Makefile                 |    2 +-
 arch/x86/kvm/boot/Makefile            |    5 +
 arch/x86/kvm/boot/seam/seamldr.S      |  188 +++
 arch/x86/kvm/boot/seam/seamloader.c   |  162 +++
 arch/x86/kvm/boot/seam/tdx.c          | 1131 +++++++++++++++
 arch/x86/kvm/ioapic.c                 |    4 +
 arch/x86/kvm/irq_comm.c               |    6 +-
 arch/x86/kvm/lapic.c                  |    9 +-
 arch/x86/kvm/lapic.h                  |    2 +-
 arch/x86/kvm/mmu.h                    |   33 +-
 arch/x86/kvm/mmu/mmu.c                |  519 +++++--
 arch/x86/kvm/mmu/mmu_internal.h       |    5 +
 arch/x86/kvm/mmu/paging_tmpl.h        |   27 +-
 arch/x86/kvm/mmu/spte.c               |   36 +-
 arch/x86/kvm/mmu/spte.h               |   30 +-
 arch/x86/kvm/svm/svm.c                |   22 +-
 arch/x86/kvm/trace.h                  |   57 +
 arch/x86/kvm/vmx/common.h             |  180 +++
 arch/x86/kvm/vmx/main.c               | 1130 +++++++++++++++
 arch/x86/kvm/vmx/posted_intr.c        |    6 +
 arch/x86/kvm/vmx/tdx.c                | 1847 +++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h                |  245 ++++
 arch/x86/kvm/vmx/tdx_arch.h           |  230 +++
 arch/x86/kvm/vmx/tdx_errno.h          |   91 ++
 arch/x86/kvm/vmx/tdx_ops.h            |  544 ++++++++
 arch/x86/kvm/vmx/tdx_stubs.c          |   45 +
 arch/x86/kvm/vmx/vmenter.S            |  140 ++
 arch/x86/kvm/vmx/vmx.c                |  537 ++-----
 arch/x86/kvm/vmx/vmx.h                |    2 +
 arch/x86/kvm/x86.c                    |  296 +++-
 include/linux/kvm_host.h              |   51 +-
 include/uapi/linux/kvm.h              |    2 +
 kernel/cpu.c                          |    4 +
 lib/firmware/intel-seam/libtdx.so     |    0
 tools/arch/x86/include/uapi/asm/kvm.h |   55 +
 tools/include/uapi/linux/kvm.h        |    2 +
 virt/kvm/kvm_main.c                   |   45 +-
 57 files changed, 7230 insertions(+), 712 deletions(-)
 create mode 100644 arch/x86/include/asm/kvm_boot.h
 create mode 100644 arch/x86/kvm/boot/Makefile
 create mode 100644 arch/x86/kvm/boot/seam/seamldr.S
 create mode 100644 arch/x86/kvm/boot/seam/seamloader.c
 create mode 100644 arch/x86/kvm/boot/seam/tdx.c
 create mode 100644 arch/x86/kvm/vmx/common.h
 create mode 100644 arch/x86/kvm/vmx/main.c
 create mode 100644 arch/x86/kvm/vmx/tdx.c
 create mode 100644 arch/x86/kvm/vmx/tdx.h
 create mode 100644 arch/x86/kvm/vmx/tdx_arch.h
 create mode 100644 arch/x86/kvm/vmx/tdx_errno.h
 create mode 100644 arch/x86/kvm/vmx/tdx_ops.h
 create mode 100644 arch/x86/kvm/vmx/tdx_stubs.c
 create mode 100644 lib/firmware/intel-seam/libtdx.so

-- 
2.17.1




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux