Hi, doc edits for you: On 9/3/19 7:26 AM, Jarkko Sakkinen wrote: > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > Document microarchitectural features of Intel SGX relevant to the > kernel. > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > Co-developed-by: Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx> > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx> > --- > Documentation/x86/index.rst | 1 + > Documentation/x86/sgx/1.Architecture.rst | 431 +++++++++++++++++++++++ > Documentation/x86/sgx/index.rst | 16 + > 3 files changed, 448 insertions(+) > create mode 100644 Documentation/x86/sgx/1.Architecture.rst > create mode 100644 Documentation/x86/sgx/index.rst > diff --git a/Documentation/x86/sgx/1.Architecture.rst b/Documentation/x86/sgx/1.Architecture.rst > new file mode 100644 > index 000000000000..a4de6c610231 > --- /dev/null > +++ b/Documentation/x86/sgx/1.Architecture.rst > @@ -0,0 +1,431 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +============ > +Architecture > +============ > + > +Introduction > +============ > + > +SGX is a set of instructions and mechanisms that enable ring 3 applications to > +set aside private regions of code and data for the purpose of establishing and > +running enclaves. An enclave is a secure entity whose private memory can only > +be accessed by code running within the enclave. Accesses from outside the > +enclave, including software running at a higher privilege level and other > +enclaves, are disallowed by hardware. > + > +SGX also provides for local and remote attestation. `Attestation`_ allows an > +enclave to attest its identity, that it has not been tampered with, that it is > +running on a genuine platform with Intel SGX enabled, and the security > +properties of the platform on which it is running. > + > +You can determine if your CPU supports SGX by querying ``/proc/cpuinfo``: > + > + ``cat /proc/cpuinfo | grep sgx`` > + > + > +Enclave Page Cache > +================== > + > +SGX utilizes an Enclave Page Cache (EPC) to store pages that are associated > +with an enclave. The EPC is secure storage whose exact physical implementation > +is micro-architecture specific (see `EPC Implemenations`_). Similar to normal > +system memory, the EPC is managed by privileged software using conventional > +paging mechanisms, e.g. the kernel can grant/deny access to EPC memory by > +manipulating a process' page tables, and can swap pages in/out of the EPC in > +order to oversubscribe the EPC. > + > +Unlikely regular memory, hardware prevents arbitrary insertion, eviction, > +deletion, access, etc... to/from the EPC. Software must instead use dedicated > +`SGX instructions`_ to operate on the EPC, which enables the processor to > +provide SGX's security guarantees by enforcing various restrictions and > +behaviors, e.g. limits concurrent accesses to EPC pages and ensures proper TLB > +flushing when moving pages in/out of the EPC. > + > +Accesses to EPC pages are allowed if and only if the access is classified as an > +"enclave access". There are two categories of allowed enclave accesses: direct > +and indirect. Direct enclave accesses are generated if and only the processor only if the > +is executing in Enclave Mode (see `Enclave execution`_). Indirect enclave > +accesses are generated by various ENCL{S,U,V} functions, many of which can be > +executed outside of Enclave Mode. > + > +Non-enclave accesses to the EPC result in undefined behavior. Conversely, > +enclave accesses to non-EPC memory result in a page fault (#PF)[1]_. Page > +faults due to invalid enclave accesses set the PF_SGX flag (bit 15) in the page > +fault error code[2]_. > + > +Although all EPC implementations will undoubtedly encrypt the EPC itself, all > +all EPC code/data is stored unencrypted in the processor's caches. I.e. SGX drop duplicate "all" above. > +relies on the aforementioned mechanisms to protect an enclave's secrets while > +they are resident in the cache. > + > +Note, EPC pages are always 4KB sized and aligned. Software can map EPC using > +using large pages, but the processor always operates on a 4KB granularity when drop duplicate "using" above. > +working with EPC pages. > + > + > +SGX instructions > +================ > + > +SGX introduces three new instructions, ENCLS, ENCLU and ENCLV, for Supervisor, > +User and Virtualization respectively. ENCL{S,U,V} are umbrella instructions, > +using a single opcode as the front end to a variety of SGX functions. The leaf > +function to execute is specified via %eax, with %rbx, %rcx and %rdx optionally > +used for leaf-specific purposes. > + > +Note that supervisor software, i.e. the kernel, creates and manages enclaves, > +but only user-level software can execute/enter an enclave. > + > +ENCLS Leafs Plural is usually Leaves. but I'll leave (no pun intended) that up to you. > +----------- > + > + - ECREATE: create an enclave > + - EADD: add page to an uninitialized enclave > + - EAUG: add page to an initialized enclave > + - EEXTEND: extended the measurement of an (uninitialized) enclave > + - EINIT: verify and initialize enclave > + - EDBG{RD,WR}: read/write from/to a debug enclave’s memory > + - EMODPR: restrict an EPC page’s permissions > + - EMODT: modify an EPC page’s type > + - EBLOCK: mark a page as blocked in EPCM > + - ETRACK{C}: activate blocking tracing > + - EWB: write back page from EPC to regular memory > + - ELD{B,U}{C}: load page in {un}blocked state from system memory to EPC > + - EPA: add version array (use to track evicted EPC pages) > + - EREMOVE: remove a page from EPC > + - ERDINFO: retrieve info about an EPC page from EPCM > + > +ENCLU Leafs > +----------- > + - EENTER: enter an enclave > + - ERESUME: resume execution of an interrupted enclave > + - EEXIT: exit an enclave > + - EGETKEY: retrieve a cryptographic key from the processor > + - EREPORT: generate a cryptographic report describing an enclave > + - EMODPE: extend an EPC page's permissions > + - EACCEPT: accept changes to an EPC page > + - EACCEPTCOPY: copy an existing EPC page to an uninitialized EPC page > + > +ENCLV Leafs > +----------- > + - E{DEC,INC}VIRTCHILD: {dec,inc}rement SECS virtual refcount > + - ESETCONTEXT: set SECS’ context pointer > + > + > +EPC page types > +============== > + > +All pages in the EPC have an explicit page type identifying the type of page. > +The type of page affects the page's accessibility, concurrency requirements, > +lifecycle, etc... > + > +SGX Enclave Control Structure (SECS) > + An enclave is defined and referenced by an SGX Enclave Control Structure. > + When creating an enclave (via ECREATE), software provides a source SECS for > + the enclave, which is copied into a target EPC page. The source SECS > + contains security and measurement information, as well as attributes and > + properties of the enclave. Once the SECS is copied into the EPC, it's used > + by the processor to store enclave metadata, e.g. the number of EPC pages > + associated with the enclave, and is no longer directly accessible by > + software. > + > +Regular (REG) > + Regular EPC pages contain the code and data of an enclave. Code and data > + pages can be added to an uninitialized enclave (prior to EINIT) via EADD. > + Post EINIT, pages can be added to an enclave via EAUG. Pages added via > + EAUG must be explicitly accepted by the enclave via EACCEPT or EACCEPTCOPY. > + > +Thread Control Structure (TCS) > + Thread Control Structure pages define the entry points to an enclave and > + track the execution state of an enclave thread. A TCS can only be used by > + a single logical CPU at any given time, but otherwise has no attachment to > + any particular logical CPU. Like regular pages, TCS pages are added to > + enclaves via EADD and EINIT. but not by EAUG? IOW, no changes to a TCS after EINIT? > + > +Version Array (VA) > + Version Array pages contain 512 slots, each of which can contain a version > + number for a page evicted from the EPC. A version number is a unique 8-byte > + value that is fed into the MAC computation used to verify the contents of an What is MAC? I don't see it mentioned anywhere else. > + evicted page when reloading said page into the EPC. VA pages are the only > + page type not directly associated with an enclave, and are allocated in the > + EPC via EPA. Note that VA pages can also be evicted from the EPC, but > + doing so requires another VA page/slot to hold the version number of the VA > + page being evicted. > + > +Trim (TRIM) > + The Trim page type indicates that a page has been trimmed from the enclave’s > + address space and is no longer accessible to enclave software, i.e. is about > + to be removed from the enclave (via EREMOVE). Removing pages from a running > + enclaves requires the enclave to explicit accept the removal (via EACCEPT). explicitly > + The intermediate Trim type allows software to batch deallocation operations > + to improve efficiency, e.g. minimize transitions between userspace, enclave > + and kernel. > + > + > +Enclave Page Cache Map > +====================== > + > +The processor tracks EPC pages via the Enclave Page Cache Map (EPCM). The EPCM > +is a processor-managed structure that enforces access restrictions to EPC pages > +in addition to the software-managed page tables. The EPCM contains one entry > +per EPC page, and although the details are implementation specific, all > +implementations contain the following architectural information: > + > + - The status of EPC page with respect to validity and accessibility. > + - An SECS identifier of the enclave to which the page belongs. > + - The type of page: regular, SECS, TCS, VA or TRIM > + - The linear address through which the enclave is allowed to access the page. > + - The specified read/write/execute permissions on that page. > + > +Access violations, e.g. insufficient permissions or incorrect linear address, > +detected via the EPCM result in a page fault (#PF)[1]_ exception being signaled > +by the processor. Page faults due to EPCM violations set the PF_SGX flag > +(bit 15) in the page fault error code[2]_. > + > +The EPCM is consulted if and only if walking the software-managed page tables, > +i.e. the kernel's page tables, succeeds. I.e. the effective permissions for an > +EPC page are a logical AND of the kernel's page tables and the corresponding > +EPCM entry. This allows the kernel to make its page tables more restrictive > +without triggering an EPCM violation, e.g. it may mark an entry as not-present > +prior to evicting a page from the EPC. > + > +**IMPORTANT** For all intents and purposes the SGX architecture allows the > +processor to invalidate all EPCM entries at will, i.e. requires that software > +be prepared to handle an EPCM fault at any time. Most processors are expected > +to implement the EPC{M} as a subset of system DRAM that is encrypted with an > +ephemeral key, i.e. a key that is randomly generated at processor reset. As a > +result of using an ephemeral key, the contents of the EPC{M} are lost when the > +processor is powered down as part of an S3 transition or when a virtual machine > +is live migrated to a new physical system. > + > + > +Enclave initialization > +====================== > + > +Because software cannot directly access the EPC except when executing in an > +enclave, an enclave must be built using ENCLS functions (ECREATE and EADD) as > +opposed to simply copying the enclave from the filesystem to memory. Once an > +enclave is built, it must be initialized (via EINIT) before userspace can enter > +the enclave and begin `Enclave execution`_. > + > +During the enclave build process, two "measurements", i.e. SHA-256 hashes, are > +taken of the enclave: MRENCLAVE and MRSIGNER. MRENCLAVE measures the enclave's > +contents, e.g. code/data explicitly added to the measurement (via EEXTEND), as > +well as metadata from the enclave's build process, e.g. pages offsets (relative > +to the enclave's base) and page permissions of all pages added to the enclave > +(via EADD). MRENCLAVE is initialized by ECREATE and finalized by EINIT. > +MRSIGNER is simply the SHA-256 hash of the public key used to sign the enclave. > + > +EINIT accepts two parameters in addition to the SECS of the target enclave: an > +Enclave Signature Struct (SIGSTRUCT) and an EINIT token (EINITTOKEN). > +SIGSTRUCT is a structure created and signed by the enclave's developer. Among > +other fields, SIGSTRUCT contains the expected MRENCLAVE of the enclave and the > +MRSIGNER of the enclave. SIGSTRUCT's MRENCLAVE is used by the processor to > +verify that the enclave was properly built (at runtime), and its SIGSTRUCT is > +copied to the SECS upon successful EINIT. EINITTOKEN is an optional parameter > +that is consumed as part of `Launch Control`_. > + > + > +Enclave execution > +================= > + > +Enclaves execute in a bespoke sub-mode of ring 3, appropriately named Enclave > +Mode. Enclave Mode changes behavior in key ways to support SGX's security > +guarantees and to reduce the probability of unintentional disclosure of > +sensitive data. > + > +A notable cornerstone of Enclave Mode is the Enclave Linear Range (ELRANGE). > +An enclave is associated with one, and only one, contiguous linear address > +range, its ELRANGE. The ELRANGE is specified via the SIZE and BASEADDR fields > +in the SECS (provided to ECREATE). The processor queries the active enclave's > +ELRANGE to differentiate enclave and non-enclave accesses, i.e. accesses that > +originate in Enclave Mode *and* whose linear address falls within ELRANGE are > +considered (direct) enclave accesses. Note, the processor also generates > +(indirect) enclave accesses when executing ENCL* instructions, which may occur > +outside of Enclave Mode, e.g. when copying the SECS to its target EPC page > +during ECREATE. > + > +Enclave Mode changes include, but are not limited to: > + > + - Permits direct software access to EPC pages owned by the enclave > + - Ensures enclave accesses map to the EPC (EPCM violation, i.e. #PF w/ PF_SGX) > + - Prevents executing code outside the enclave's ELRANGE (#GP fault) > + - Changes the behavior of exceptions/events > + - Causes many instructions to become illegal, i.e. generate an exception > + - Supresses all instruction breakpoints* SUppresses > + - Suppresses data breakpoints within enclave's ELRANGE* > + > + * For non-debug enclaves. > + > +Transitions to/from Enclave Mode have semantics that are a lovely blend of > +SYSCALL, SYSRET and VM-Exit. In normal execution, entering and exiting Enclave > +Mode can only be done through EENTER and EEXIT respectively. EENTER+EEXIT is > +analogous to SYSCALL+SYSRET, e.g. EENTER/SYSCALL load RCX with the next RIP and > +EEXIT/SYSRET load RIP from R{B,C}X, and EENTER can only jump to a predefined > +location controlled by the enclave/kernel. > + > +But when an exception, interrupt, VM-Exit, etc... occurs, enclave transitions etc. occurs, > +behave more like VM-Exit and VMRESUME. To maintain the black box nature of the > +enclave, the processor automatically switches register context when any of the > +aforementioned events occur (the SDM refers to such events as Enclave Exiting > +Events (EEE)). > + > +To handle an EEE, the processor performs an Asynchronous Enclave Exits (AEX). Exit {?} > +Note, although exceptions and traps are synchronous from a processor execution > +perspective, the are asynchronous from the enclave's perspective as the enclave > +is not provided an opportunity to save/fuzz state prior to exiting the enclave. > +On an AEX, the processor exits the enclave to a predefined %rip called the > +Asynchronous Exiting Pointer (AEP). The AEP is specified at enclave entry (via > +EENTER/ERESUME) and saved into the associated TCS, similar to how a hypervisor > +specifies the VM-Exit target (via VMCS.HOST_RIP at VMLAUNCH/VMRESUME), i.e. the > +the AEP is an exit location controlled by the enclave's untrusted runtime. > + > +On an AEX, the processor fully exits the enclave prior to vectoring the event, > +i.e. from the event handler's perspective the event occurred at the AEP. Thus, > +IRET/RSM/VMRESUME (from the event handler) returns control to the enclave's > +untrusted runtime, which can take appropriate action, e.g. immediately ERESUME > +the enclave on interrupts, forward expected exceptions to the enclave, restart > +the enclave on fatal exceptions, and so on and so forth. > + > +To preserve the enclave's state across AEX events, the processor automatically > +saves architectural into a State Save Area (SSA). Because SGX supports nested saves architectural state into > +AEX events, e.g. the untrusted runtime can re-EENTER the enclave after an AEX, > +which can in turn trigger an AEX, the TCS holds a pointer to a stack of SSA > +frames (as opposed to a single SSA), an index to the current SSA frame and the > +total number of available frames. When an AEX occurs, the processor saves the > +architectural state into the TCS's current SSA frame. The untrusted runtime > +can then pop the last SSA frame (off the TCS's stack) via ERESUME, i.e. restart > +the enclave after the AEX is handled. > + > + > +Launch Control > +============== > + > +SGX provides a set of controls, referred to as Launch Control, that governs the > +initialization of enclaves. The processor internally stores a SHA-256 hash of > +a 3072-bit RSA public key, i.e. a MRSIGNER, often referred to as the "LE pubkey > +hash". The LE pubkey hash is used during EINIT to prevent launching an enclave > +without proper authorization. In order for EINIT to succeed, the enclave's > +MRSIGNER (from SIGSTRUCT) *or* the MRSIGNER of the enclave's EINITTOKEN must > +match the LE pubkey hash. > + > +An EINITTOKEN can only be created by a so called Launch Enclave (LE). A LE is so-called > +an enclave with SECS.ATTRIBUTES.EINITTOKEN_KEY=1, which grants it access to the > +EINITTOKEN_KEY (retrieved via EGETKEY). EINITTOKENs provide a ready-built > +mechanism for userspace to bless enclaves without requiring additional kernel > +infrastructure. > + > +Processors that support SGX Launch Control Configuration, enumerated by the > +SGX_LC flag (bit 30 in CPUID 0x7.0x0.ECX), expose the LE pubkey hash as a set > +of four MSRs, aptly named IA32_SGXLEPUBKEYHASH[0-3]. The reset value of the > +MSRs is an internally defined (Intel) key (processors that don't support > +SGX_LC also use an internally defined key, it's just not exposed to software). > + > +While the IA32_SGXLEPUBKEYHASH MSRs are readable on any platform that supports > +SGX_LC, the MSRs are only writable if the IA32_FEATURE_CONTROL is locked with > +bit 17 ("SGX Launch Control Enable" per the SDM, or more accurately "SGX LE > +pubkey hash writable") set to '1'. Note, the MSRs are also writable prior to > +`SGX activation`_. > + > +Note, while "Launch Control Configuration" is the official feature name used by > +the Intel SDM, other documentation may use the term "Flexible Launch Control", > +or even simply "Launch Control". Colloquially, the vast majority of usage of > +the term "Launch Control" is synonymous with "Launch Control Configuration". > + > + > +EPC oversubscription > +==================== > + > +SGX supports the concept of EPC oversubscription. Analogous to swapping system > +DRAM to disk, enclave pages can be swapped from the EPC to memory, and later > +reloaded from memory to the EPC. But because the kernel is untrusted, swapping > +pages in/out of the EPC has specialized requirements: > + > + - The kernel cannot directly access EPC memory, i.e. cannot copy data to/from > + the EPC. > + - The kernel must "prove" to hardware that there are no valid TLB entries for > + said page prior to eviction (a stale TLB entry would allow an attacker to > + bypass SGX access controls). > + - When loading a page back into the EPC, hardware must be able to verify > + the integrity and freshness of the data. > + - When loading an enclave page, e.g. regular and TCS pages, hardware must be > + able to associate the page with an SECS, i.e. refcount an enclaves pages. enclave's > + > +To satisfy the above requirements, the CPU provides dedicated ENCLS functions > +to support paging data in/out of the EPC: > + > + - EBLOCK: Mark a page as blocked in the EPC Map (EPCM). Attempting to access > + a blocked page that misses the TLB will fault. > + - ETRACK: Activate TLB tracking. Hardware verifies that all translations for > + pages marked as "blocked" have been flushed from the TLB. > + - EPA: Add Version Array page to the EPC (see `EPC page types`_) > + - EWB: Write back a page from EPC to memory, e.g. RAM. Software must > + supply a VA slot, memory to hold the Paging Crypto Metadata (PCMD) of the > + page and obviously backing for the evicted page. > + - ELD*: Load a page in {un}blocked state from memory to EPC. > + > +Swapped EPC pages are {de,en}crypted on their way in/out of the EPC, e.g. EWB > +encrypts and ELDU decrypts. The version number (stored in a VA page) and PCMD > +structure associated with an evicted EPC page seal a page (prevent undetected > +modification) and ensure its freshness (prevent rollback to a stale version of > +the page) while the page resides in unprotected storage, e.g. memory or disk. > + > + > +Attestation > +=========== > + > +SGX provides mechanisms that allow software to implement what Intel refers to > +as Local Attestation (used by enclaves running on a the same physical platform > +to securely identify one another) and Remote Attestation (a process by which an > +enclave attests itself to a remote entity in order to gain the trust of said > +entity). > + > +The details of Local Attestation and Remote Attestation are far beyond the > +scope of this document. Please see Intel's Software Developer's Manual and/or > +use your search engine of choice to learn more about SGX's attestation > +capabilities. > + > + > +EPC Implemenations > +================== > + > +PRM with MEE > +-------------- wrong length underline! > + > +Initial hardware support for SGX implements the EPC by reserving a chunk of > +system DRAM, referred to as Processor Reserved Memory (PRM). A percentage of > +PRM is consumed by the processor to implement the EPCM, with the remainder of > +PRM being exposed to software as the EPC. PRM is configured by firmware via > +dedicated PRM Range Registers (PRMRRs). The PRMRRs are locked which are locked as part of SGX activation, i.e. confusing. "are locked which are locked" > +resizing the PRM, and thus EPC, requires rebooting the system. > + > +An autonomous hardware unit called the Memory Encryption Engine (MEE) protects > +the confidentiality, integrity, and freshness of the PRM, e.g. {de,en}crypts > +data as it is read/written from/to DRAM to provide confidentiality. > + > + > +SGX activation > +============== > + > +Before SGX can be fully enabled, e.g. via FEATURE_CONTROL, the platform must > +undergo explicit SGX activation. SGX activation is a mechanism by which the > +processor verifies and locks the platform configuration set by pre-boot > +firmware, e.g. to ensure it satisfies SGX's security requirements. Before > +SGX is activated (and its configuration locked), firmware can modify the > +PRMRRs, e.g. to set the base/size of the PRM and thus EPC, and can also write > +the SGX_LEPUBKEYHASH MSRs. Notably, the latter allows pre-boot firmware to > +lock the SGX_LEPUBKEYHASH MSRs to a non-Intel value by writing the MSRs and > +locking MSR_IA32_FEATURE_CONTROL without setting the "SGX LE pubkey hash > +writable" flag, i.e. making the SGX_LEPUBKEYHASH MSRs readonly. > + > + > +Footnotes > +========= > + > +.. [1] All processors that do not support the SGX2 ISA take an errata and > + signal #GP(0) instead of #PF(PF_SGX) when vectoring EPCM violations and > + faults due to enclave-accesses to non-EPC memory. > + > +.. [2] Note that despite being vectored as a #PF, a #PF with PF_SGX has nothing > + to do with conventional paging. > + -- ~Randy