On Wed, Aug 07, 2019 at 06:33:32AM +0000, Jethro Beekman wrote: > On 2019-07-13 10:07, Jarkko Sakkinen wrote: > > Because the kernel is untrusted, swapping pages in/out of the Enclave > > Page Cache (EPC) has specialized requirements: > > > > * The kernel cannot directly access EPC memory, i.e. cannot copy data > > to/from the EPC. > > * To evict a page from the EPC, the kernel must "prove" to hardware that > > are no valid TLB entries for said page since a stale TLB entry would > > allow an attacker to bypass SGX access controls. > > * When loading a page back into the EPC, hardware must be able to verify > > the integrity and freshness of the data. > > * When loading an enclave page, e.g. regular pages and Thread Control > > Structures (TCS), hardware must be able to associate the page with a > > Secure Enclave Control Structure (SECS). > > > > To satisfy the above requirements, the CPU provides dedicated ENCLS > > functions to support paging data in/out of the EPC: > > > > * EBLOCK: Mark a page as blocked in the EPC Map (EPCM). Attempting > > to access a blocked page that misses the TLB will fault. > > * ETRACK: Activate blocking tracking. Hardware verifies that all > > translations for pages marked as "blocked" have been flushed > > from the TLB. > > * EPA: Add version array page to the EPC. As the name suggests, a > > VA page is an 512-entry array of version numbers that are > > used to uniquely identify pages evicted from the EPC. > > * EWB: Write back a page from EPC to memory, e.g. RAM. Software > > must supply a VA slot, memory to hold the a Paging Crypto > > Metadata (PCMD) of the page and obviously backing for the > > evicted page. > > * ELD{B,U}: Load a page in {un}blocked state from memory to EPC. The > > driver only uses the ELDU variant as there is no use case > > for loading a page as "blocked" in a bare metal environment. > > > > To top things off, all of the above ENCLS functions are subject to > > strict concurrency rules, e.g. many operations will #GP fault if two > > or more operations attempt to access common pages/structures. > > > > To put it succinctly, paging in/out of the EPC requires coordinating > > with the SGX driver where all of an enclave's tracking resides. But, > > simply shoving all reclaim logic into the driver is not desirable as > > doing so has unwanted long term implications: > > > > * Oversubscribing EPC to KVM guests, i.e. virtualizing SGX in KVM and > > swapping a guest's EPC pages (without the guest's cooperation) needs > > the same high level flows for reclaim but has painfully different > > semantics in the details. > > * Accounting EPC, i.e. adding an EPC cgroup controller, is desirable > > as EPC is effectively a specialized memory type and even more scarce > > than system memory. Providing a single touchpoint for EPC accounting > > regardless of end consumer greatly simplifies the EPC controller. > > * Allowing the userspace-facing driver to be built as a loaded module > > is desirable, e.g. for debug, testing and development. The cgroup > > infrastructure does not support dependencies on loadable modules. > > * Separating EPC swapping from the driver once it has been tightly > > coupled to the driver is non-trivial (speaking from experience). > > Some of these points seem stale now. Thanks for spotting. I'll do a full edit for the commit message and try to make it more short and punctual. /Jarkko