From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> Document microarchitectural features of Intel SGX relevant to the kernel. Cc: linux-doc@xxxxxxxxxxxxxxx Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> Co-developed-by: Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx> --- Documentation/x86/index.rst | 1 + Documentation/x86/sgx/1.Architecture.rst | 431 +++++++++++++++++++++++ Documentation/x86/sgx/index.rst | 16 + 3 files changed, 448 insertions(+) create mode 100644 Documentation/x86/sgx/1.Architecture.rst create mode 100644 Documentation/x86/sgx/index.rst diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index af64c4bb4447..f3f3ba45de35 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -30,3 +30,4 @@ x86-specific Documentation usb-legacy-support i386/index x86_64/index + sgx/index diff --git a/Documentation/x86/sgx/1.Architecture.rst b/Documentation/x86/sgx/1.Architecture.rst new file mode 100644 index 000000000000..a4de6c610231 --- /dev/null +++ b/Documentation/x86/sgx/1.Architecture.rst @@ -0,0 +1,431 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============ +Architecture +============ + +Introduction +============ + +SGX is a set of instructions and mechanisms that enable ring 3 applications to +set aside private regions of code and data for the purpose of establishing and +running enclaves. An enclave is a secure entity whose private memory can only +be accessed by code running within the enclave. Accesses from outside the +enclave, including software running at a higher privilege level and other +enclaves, are disallowed by hardware. + +SGX also provides for local and remote attestation. `Attestation`_ allows an +enclave to attest its identity, that it has not been tampered with, that it is +running on a genuine platform with Intel SGX enabled, and the security +properties of the platform on which it is running. + +You can determine if your CPU supports SGX by querying ``/proc/cpuinfo``: + + ``cat /proc/cpuinfo | grep sgx`` + + +Enclave Page Cache +================== + +SGX utilizes an Enclave Page Cache (EPC) to store pages that are associated +with an enclave. The EPC is secure storage whose exact physical implementation +is micro-architecture specific (see `EPC Implemenations`_). Similar to normal +system memory, the EPC is managed by privileged software using conventional +paging mechanisms, e.g. the kernel can grant/deny access to EPC memory by +manipulating a process' page tables, and can swap pages in/out of the EPC in +order to oversubscribe the EPC. + +Unlikely regular memory, hardware prevents arbitrary insertion, eviction, +deletion, access, etc... to/from the EPC. Software must instead use dedicated +`SGX instructions`_ to operate on the EPC, which enables the processor to +provide SGX's security guarantees by enforcing various restrictions and +behaviors, e.g. limits concurrent accesses to EPC pages and ensures proper TLB +flushing when moving pages in/out of the EPC. + +Accesses to EPC pages are allowed if and only if the access is classified as an +"enclave access". There are two categories of allowed enclave accesses: direct +and indirect. Direct enclave accesses are generated if and only the processor +is executing in Enclave Mode (see `Enclave execution`_). Indirect enclave +accesses are generated by various ENCL{S,U,V} functions, many of which can be +executed outside of Enclave Mode. + +Non-enclave accesses to the EPC result in undefined behavior. Conversely, +enclave accesses to non-EPC memory result in a page fault (#PF)[1]_. Page +faults due to invalid enclave accesses set the PF_SGX flag (bit 15) in the page +fault error code[2]_. + +Although all EPC implementations will undoubtedly encrypt the EPC itself, all +all EPC code/data is stored unencrypted in the processor's caches. I.e. SGX +relies on the aforementioned mechanisms to protect an enclave's secrets while +they are resident in the cache. + +Note, EPC pages are always 4KB sized and aligned. Software can map EPC using +using large pages, but the processor always operates on a 4KB granularity when +working with EPC pages. + + +SGX instructions +================ + +SGX introduces three new instructions, ENCLS, ENCLU and ENCLV, for Supervisor, +User and Virtualization respectively. ENCL{S,U,V} are umbrella instructions, +using a single opcode as the front end to a variety of SGX functions. The leaf +function to execute is specified via %eax, with %rbx, %rcx and %rdx optionally +used for leaf-specific purposes. + +Note that supervisor software, i.e. the kernel, creates and manages enclaves, +but only user-level software can execute/enter an enclave. + +ENCLS Leafs +----------- + + - ECREATE: create an enclave + - EADD: add page to an uninitialized enclave + - EAUG: add page to an initialized enclave + - EEXTEND: extended the measurement of an (uninitialized) enclave + - EINIT: verify and initialize enclave + - EDBG{RD,WR}: read/write from/to a debug enclave’s memory + - EMODPR: restrict an EPC page’s permissions + - EMODT: modify an EPC page’s type + - EBLOCK: mark a page as blocked in EPCM + - ETRACK{C}: activate blocking tracing + - EWB: write back page from EPC to regular memory + - ELD{B,U}{C}: load page in {un}blocked state from system memory to EPC + - EPA: add version array (use to track evicted EPC pages) + - EREMOVE: remove a page from EPC + - ERDINFO: retrieve info about an EPC page from EPCM + +ENCLU Leafs +----------- + - EENTER: enter an enclave + - ERESUME: resume execution of an interrupted enclave + - EEXIT: exit an enclave + - EGETKEY: retrieve a cryptographic key from the processor + - EREPORT: generate a cryptographic report describing an enclave + - EMODPE: extend an EPC page's permissions + - EACCEPT: accept changes to an EPC page + - EACCEPTCOPY: copy an existing EPC page to an uninitialized EPC page + +ENCLV Leafs +----------- + - E{DEC,INC}VIRTCHILD: {dec,inc}rement SECS virtual refcount + - ESETCONTEXT: set SECS’ context pointer + + +EPC page types +============== + +All pages in the EPC have an explicit page type identifying the type of page. +The type of page affects the page's accessibility, concurrency requirements, +lifecycle, etc... + +SGX Enclave Control Structure (SECS) + An enclave is defined and referenced by an SGX Enclave Control Structure. + When creating an enclave (via ECREATE), software provides a source SECS for + the enclave, which is copied into a target EPC page. The source SECS + contains security and measurement information, as well as attributes and + properties of the enclave. Once the SECS is copied into the EPC, it's used + by the processor to store enclave metadata, e.g. the number of EPC pages + associated with the enclave, and is no longer directly accessible by + software. + +Regular (REG) + Regular EPC pages contain the code and data of an enclave. Code and data + pages can be added to an uninitialized enclave (prior to EINIT) via EADD. + Post EINIT, pages can be added to an enclave via EAUG. Pages added via + EAUG must be explicitly accepted by the enclave via EACCEPT or EACCEPTCOPY. + +Thread Control Structure (TCS) + Thread Control Structure pages define the entry points to an enclave and + track the execution state of an enclave thread. A TCS can only be used by + a single logical CPU at any given time, but otherwise has no attachment to + any particular logical CPU. Like regular pages, TCS pages are added to + enclaves via EADD and EINIT. + +Version Array (VA) + Version Array pages contain 512 slots, each of which can contain a version + number for a page evicted from the EPC. A version number is a unique 8-byte + value that is fed into the MAC computation used to verify the contents of an + evicted page when reloading said page into the EPC. VA pages are the only + page type not directly associated with an enclave, and are allocated in the + EPC via EPA. Note that VA pages can also be evicted from the EPC, but + doing so requires another VA page/slot to hold the version number of the VA + page being evicted. + +Trim (TRIM) + The Trim page type indicates that a page has been trimmed from the enclave’s + address space and is no longer accessible to enclave software, i.e. is about + to be removed from the enclave (via EREMOVE). Removing pages from a running + enclaves requires the enclave to explicit accept the removal (via EACCEPT). + The intermediate Trim type allows software to batch deallocation operations + to improve efficiency, e.g. minimize transitions between userspace, enclave + and kernel. + + +Enclave Page Cache Map +====================== + +The processor tracks EPC pages via the Enclave Page Cache Map (EPCM). The EPCM +is a processor-managed structure that enforces access restrictions to EPC pages +in addition to the software-managed page tables. The EPCM contains one entry +per EPC page, and although the details are implementation specific, all +implementations contain the following architectural information: + + - The status of EPC page with respect to validity and accessibility. + - An SECS identifier of the enclave to which the page belongs. + - The type of page: regular, SECS, TCS, VA or TRIM + - The linear address through which the enclave is allowed to access the page. + - The specified read/write/execute permissions on that page. + +Access violations, e.g. insufficient permissions or incorrect linear address, +detected via the EPCM result in a page fault (#PF)[1]_ exception being signaled +by the processor. Page faults due to EPCM violations set the PF_SGX flag +(bit 15) in the page fault error code[2]_. + +The EPCM is consulted if and only if walking the software-managed page tables, +i.e. the kernel's page tables, succeeds. I.e. the effective permissions for an +EPC page are a logical AND of the kernel's page tables and the corresponding +EPCM entry. This allows the kernel to make its page tables more restrictive +without triggering an EPCM violation, e.g. it may mark an entry as not-present +prior to evicting a page from the EPC. + +**IMPORTANT** For all intents and purposes the SGX architecture allows the +processor to invalidate all EPCM entries at will, i.e. requires that software +be prepared to handle an EPCM fault at any time. Most processors are expected +to implement the EPC{M} as a subset of system DRAM that is encrypted with an +ephemeral key, i.e. a key that is randomly generated at processor reset. As a +result of using an ephemeral key, the contents of the EPC{M} are lost when the +processor is powered down as part of an S3 transition or when a virtual machine +is live migrated to a new physical system. + + +Enclave initialization +====================== + +Because software cannot directly access the EPC except when executing in an +enclave, an enclave must be built using ENCLS functions (ECREATE and EADD) as +opposed to simply copying the enclave from the filesystem to memory. Once an +enclave is built, it must be initialized (via EINIT) before userspace can enter +the enclave and begin `Enclave execution`_. + +During the enclave build process, two "measurements", i.e. SHA-256 hashes, are +taken of the enclave: MRENCLAVE and MRSIGNER. MRENCLAVE measures the enclave's +contents, e.g. code/data explicitly added to the measurement (via EEXTEND), as +well as metadata from the enclave's build process, e.g. pages offsets (relative +to the enclave's base) and page permissions of all pages added to the enclave +(via EADD). MRENCLAVE is initialized by ECREATE and finalized by EINIT. +MRSIGNER is simply the SHA-256 hash of the public key used to sign the enclave. + +EINIT accepts two parameters in addition to the SECS of the target enclave: an +Enclave Signature Struct (SIGSTRUCT) and an EINIT token (EINITTOKEN). +SIGSTRUCT is a structure created and signed by the enclave's developer. Among +other fields, SIGSTRUCT contains the expected MRENCLAVE of the enclave and the +MRSIGNER of the enclave. SIGSTRUCT's MRENCLAVE is used by the processor to +verify that the enclave was properly built (at runtime), and its SIGSTRUCT is +copied to the SECS upon successful EINIT. EINITTOKEN is an optional parameter +that is consumed as part of `Launch Control`_. + + +Enclave execution +================= + +Enclaves execute in a bespoke sub-mode of ring 3, appropriately named Enclave +Mode. Enclave Mode changes behavior in key ways to support SGX's security +guarantees and to reduce the probability of unintentional disclosure of +sensitive data. + +A notable cornerstone of Enclave Mode is the Enclave Linear Range (ELRANGE). +An enclave is associated with one, and only one, contiguous linear address +range, its ELRANGE. The ELRANGE is specified via the SIZE and BASEADDR fields +in the SECS (provided to ECREATE). The processor queries the active enclave's +ELRANGE to differentiate enclave and non-enclave accesses, i.e. accesses that +originate in Enclave Mode *and* whose linear address falls within ELRANGE are +considered (direct) enclave accesses. Note, the processor also generates +(indirect) enclave accesses when executing ENCL* instructions, which may occur +outside of Enclave Mode, e.g. when copying the SECS to its target EPC page +during ECREATE. + +Enclave Mode changes include, but are not limited to: + + - Permits direct software access to EPC pages owned by the enclave + - Ensures enclave accesses map to the EPC (EPCM violation, i.e. #PF w/ PF_SGX) + - Prevents executing code outside the enclave's ELRANGE (#GP fault) + - Changes the behavior of exceptions/events + - Causes many instructions to become illegal, i.e. generate an exception + - Supresses all instruction breakpoints* + - Suppresses data breakpoints within enclave's ELRANGE* + + * For non-debug enclaves. + +Transitions to/from Enclave Mode have semantics that are a lovely blend of +SYSCALL, SYSRET and VM-Exit. In normal execution, entering and exiting Enclave +Mode can only be done through EENTER and EEXIT respectively. EENTER+EEXIT is +analogous to SYSCALL+SYSRET, e.g. EENTER/SYSCALL load RCX with the next RIP and +EEXIT/SYSRET load RIP from R{B,C}X, and EENTER can only jump to a predefined +location controlled by the enclave/kernel. + +But when an exception, interrupt, VM-Exit, etc... occurs, enclave transitions +behave more like VM-Exit and VMRESUME. To maintain the black box nature of the +enclave, the processor automatically switches register context when any of the +aforementioned events occur (the SDM refers to such events as Enclave Exiting +Events (EEE)). + +To handle an EEE, the processor performs an Asynchronous Enclave Exits (AEX). +Note, although exceptions and traps are synchronous from a processor execution +perspective, the are asynchronous from the enclave's perspective as the enclave +is not provided an opportunity to save/fuzz state prior to exiting the enclave. +On an AEX, the processor exits the enclave to a predefined %rip called the +Asynchronous Exiting Pointer (AEP). The AEP is specified at enclave entry (via +EENTER/ERESUME) and saved into the associated TCS, similar to how a hypervisor +specifies the VM-Exit target (via VMCS.HOST_RIP at VMLAUNCH/VMRESUME), i.e. the +the AEP is an exit location controlled by the enclave's untrusted runtime. + +On an AEX, the processor fully exits the enclave prior to vectoring the event, +i.e. from the event handler's perspective the event occurred at the AEP. Thus, +IRET/RSM/VMRESUME (from the event handler) returns control to the enclave's +untrusted runtime, which can take appropriate action, e.g. immediately ERESUME +the enclave on interrupts, forward expected exceptions to the enclave, restart +the enclave on fatal exceptions, and so on and so forth. + +To preserve the enclave's state across AEX events, the processor automatically +saves architectural into a State Save Area (SSA). Because SGX supports nested +AEX events, e.g. the untrusted runtime can re-EENTER the enclave after an AEX, +which can in turn trigger an AEX, the TCS holds a pointer to a stack of SSA +frames (as opposed to a single SSA), an index to the current SSA frame and the +total number of available frames. When an AEX occurs, the processor saves the +architectural state into the TCS's current SSA frame. The untrusted runtime +can then pop the last SSA frame (off the TCS's stack) via ERESUME, i.e. restart +the enclave after the AEX is handled. + + +Launch Control +============== + +SGX provides a set of controls, referred to as Launch Control, that governs the +initialization of enclaves. The processor internally stores a SHA-256 hash of +a 3072-bit RSA public key, i.e. a MRSIGNER, often referred to as the "LE pubkey +hash". The LE pubkey hash is used during EINIT to prevent launching an enclave +without proper authorization. In order for EINIT to succeed, the enclave's +MRSIGNER (from SIGSTRUCT) *or* the MRSIGNER of the enclave's EINITTOKEN must +match the LE pubkey hash. + +An EINITTOKEN can only be created by a so called Launch Enclave (LE). A LE is +an enclave with SECS.ATTRIBUTES.EINITTOKEN_KEY=1, which grants it access to the +EINITTOKEN_KEY (retrieved via EGETKEY). EINITTOKENs provide a ready-built +mechanism for userspace to bless enclaves without requiring additional kernel +infrastructure. + +Processors that support SGX Launch Control Configuration, enumerated by the +SGX_LC flag (bit 30 in CPUID 0x7.0x0.ECX), expose the LE pubkey hash as a set +of four MSRs, aptly named IA32_SGXLEPUBKEYHASH[0-3]. The reset value of the +MSRs is an internally defined (Intel) key (processors that don't support +SGX_LC also use an internally defined key, it's just not exposed to software). + +While the IA32_SGXLEPUBKEYHASH MSRs are readable on any platform that supports +SGX_LC, the MSRs are only writable if the IA32_FEATURE_CONTROL is locked with +bit 17 ("SGX Launch Control Enable" per the SDM, or more accurately "SGX LE +pubkey hash writable") set to '1'. Note, the MSRs are also writable prior to +`SGX activation`_. + +Note, while "Launch Control Configuration" is the official feature name used by +the Intel SDM, other documentation may use the term "Flexible Launch Control", +or even simply "Launch Control". Colloquially, the vast majority of usage of +the term "Launch Control" is synonymous with "Launch Control Configuration". + + +EPC oversubscription +==================== + +SGX supports the concept of EPC oversubscription. Analogous to swapping system +DRAM to disk, enclave pages can be swapped from the EPC to memory, and later +reloaded from memory to the EPC. But because the kernel is untrusted, swapping +pages in/out of the EPC has specialized requirements: + + - The kernel cannot directly access EPC memory, i.e. cannot copy data to/from + the EPC. + - The kernel must "prove" to hardware that there are no valid TLB entries for + said page prior to eviction (a stale TLB entry would allow an attacker to + bypass SGX access controls). + - When loading a page back into the EPC, hardware must be able to verify + the integrity and freshness of the data. + - When loading an enclave page, e.g. regular and TCS pages, hardware must be + able to associate the page with an SECS, i.e. refcount an enclaves pages. + +To satisfy the above requirements, the CPU provides dedicated ENCLS functions +to support paging data in/out of the EPC: + + - EBLOCK: Mark a page as blocked in the EPC Map (EPCM). Attempting to access + a blocked page that misses the TLB will fault. + - ETRACK: Activate TLB tracking. Hardware verifies that all translations for + pages marked as "blocked" have been flushed from the TLB. + - EPA: Add Version Array page to the EPC (see `EPC page types`_) + - EWB: Write back a page from EPC to memory, e.g. RAM. Software must + supply a VA slot, memory to hold the Paging Crypto Metadata (PCMD) of the + page and obviously backing for the evicted page. + - ELD*: Load a page in {un}blocked state from memory to EPC. + +Swapped EPC pages are {de,en}crypted on their way in/out of the EPC, e.g. EWB +encrypts and ELDU decrypts. The version number (stored in a VA page) and PCMD +structure associated with an evicted EPC page seal a page (prevent undetected +modification) and ensure its freshness (prevent rollback to a stale version of +the page) while the page resides in unprotected storage, e.g. memory or disk. + + +Attestation +=========== + +SGX provides mechanisms that allow software to implement what Intel refers to +as Local Attestation (used by enclaves running on a the same physical platform +to securely identify one another) and Remote Attestation (a process by which an +enclave attests itself to a remote entity in order to gain the trust of said +entity). + +The details of Local Attestation and Remote Attestation are far beyond the +scope of this document. Please see Intel's Software Developer's Manual and/or +use your search engine of choice to learn more about SGX's attestation +capabilities. + + +EPC Implemenations +================== + +PRM with MEE +-------------- + +Initial hardware support for SGX implements the EPC by reserving a chunk of +system DRAM, referred to as Processor Reserved Memory (PRM). A percentage of +PRM is consumed by the processor to implement the EPCM, with the remainder of +PRM being exposed to software as the EPC. PRM is configured by firmware via +dedicated PRM Range Registers (PRMRRs). The PRMRRs are locked which are locked as part of SGX activation, i.e. +resizing the PRM, and thus EPC, requires rebooting the system. + +An autonomous hardware unit called the Memory Encryption Engine (MEE) protects +the confidentiality, integrity, and freshness of the PRM, e.g. {de,en}crypts +data as it is read/written from/to DRAM to provide confidentiality. + + +SGX activation +============== + +Before SGX can be fully enabled, e.g. via FEATURE_CONTROL, the platform must +undergo explicit SGX activation. SGX activation is a mechanism by which the +processor verifies and locks the platform configuration set by pre-boot +firmware, e.g. to ensure it satisfies SGX's security requirements. Before +SGX is activated (and its configuration locked), firmware can modify the +PRMRRs, e.g. to set the base/size of the PRM and thus EPC, and can also write +the SGX_LEPUBKEYHASH MSRs. Notably, the latter allows pre-boot firmware to +lock the SGX_LEPUBKEYHASH MSRs to a non-Intel value by writing the MSRs and +locking MSR_IA32_FEATURE_CONTROL without setting the "SGX LE pubkey hash +writable" flag, i.e. making the SGX_LEPUBKEYHASH MSRs readonly. + + +Footnotes +========= + +.. [1] All processors that do not support the SGX2 ISA take an errata and + signal #GP(0) instead of #PF(PF_SGX) when vectoring EPCM violations and + faults due to enclave-accesses to non-EPC memory. + +.. [2] Note that despite being vectored as a #PF, a #PF with PF_SGX has nothing + to do with conventional paging. + diff --git a/Documentation/x86/sgx/index.rst b/Documentation/x86/sgx/index.rst new file mode 100644 index 000000000000..c5dfef62e612 --- /dev/null +++ b/Documentation/x86/sgx/index.rst @@ -0,0 +1,16 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================= +Software Guard Extensions +========================= + +Intel(R) SGX is a set of architectural extensions that enables applications to +establish secure containers, a.k.a. enclaves. SGX enclaves provide security +guarantees such as integrity and confidentiality, even when running on a system +where privileged software, e.g. kernel, hypervisor, etc... is untrusted and +potentially malicious. + +.. toctree:: + :maxdepth: 1 + + 1.Architecture -- 2.20.1