On Tue, 2022-04-05 at 13:03 +0300, Jarkko Sakkinen wrote: > On Tue, 2022-04-05 at 08:05 +0300, Jarkko Sakkinen wrote: > > On Mon, 2022-04-04 at 09:49 -0700, Reinette Chatre wrote: > > > With SGX1 an enclave needs to be created with its maximum memory demands > > > allocated. Pages cannot be added to an enclave after it is initialized. > > > SGX2 introduces a new function, ENCLS[EAUG], that can be used to add > > > pages to an initialized enclave. With SGX2 the enclave still needs to > > > set aside address space for its maximum memory demands during enclave > > > creation, but all pages need not be added before enclave initialization. > > > Pages can be added during enclave runtime. > > > > > > Add support for dynamically adding pages to an initialized enclave, > > > architecturally limited to RW permission at creation but allowed to > > > obtain RWX permissions after enclave runs EMODPE. Add pages via the > > > page fault handler at the time an enclave address without a backing > > > enclave page is accessed, potentially directly reclaiming pages if > > > no free pages are available. > > > > > > The enclave is still required to run ENCLU[EACCEPT] on the page before > > > it can be used. A useful flow is for the enclave to run ENCLU[EACCEPT] > > > on an uninitialized address. This will trigger the page fault handler > > > that will add the enclave page and return execution to the enclave to > > > repeat the ENCLU[EACCEPT] instruction, this time successful. > > > > > > If the enclave accesses an uninitialized address in another way, for > > > example by expanding the enclave stack to a page that has not yet been > > > added, then the page fault handler would add the page on the first > > > write but upon returning to the enclave the instruction that triggered > > > the page fault would be repeated and since ENCLU[EACCEPT] was not run > > > yet it would trigger a second page fault, this time with the SGX flag > > > set in the page fault error code. This can only be recovered by entering > > > the enclave again and directly running the ENCLU[EACCEPT] instruction on > > > the now initialized address. > > > > > > Accessing an uninitialized address from outside the enclave also > > > triggers this flow but the page will remain inaccessible (access will > > > result in #PF) until accepted from within the enclave via > > > ENCLU[EACCEPT]. > > > > > > Signed-off-by: Reinette Chatre <reinette.chatre@xxxxxxxxx> > > > --- > > > Changes since V2: > > > - Remove runtime tracking of EPCM permissions > > > (sgx_encl_page->vm_run_prot_bits) (Jarkko). > > > - Move export of sgx_encl_{grow,shrink}() to separate patch. (Jarkko) > > > - Use sgx_encl_page_alloc(). (Jarkko) > > > - Set max allowed permissions to be RWX (Jarkko). Update changelog > > > to indicate the change and use comment in code as > > > created by Jarkko in: > > > https://lore.kernel.org/linux-sgx/20220306053211.135762-4-jarkko@xxxxxxxxxx > > > - Do not set protection bits but let it be inherited by VMA (Jarkko) > > > > > > Changes since V1: > > > - Fix subject line "to initialized" -> "to an initialized" (Jarkko). > > > - Move text about hardware's PENDING state to the patch that introduces > > > the ENCLS[EAUG] wrapper (Jarkko). > > > - Ensure kernel-doc uses brackets when referring to function. > > > > > > arch/x86/kernel/cpu/sgx/encl.c | 124 +++++++++++++++++++++++++++++++++ > > > 1 file changed, 124 insertions(+) > > > > > > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c > > > index 546423753e4c..fa4f947f8496 100644 > > > --- a/arch/x86/kernel/cpu/sgx/encl.c > > > +++ b/arch/x86/kernel/cpu/sgx/encl.c > > > @@ -194,6 +194,119 @@ struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, > > > return __sgx_encl_load_page(encl, entry); > > > } > > > > > > +/** > > > + * sgx_encl_eaug_page() - Dynamically add page to initialized enclave > > > + * @vma: VMA obtained from fault info from where page is accessed > > > + * @encl: enclave accessing the page > > > + * @addr: address that triggered the page fault > > > + * > > > + * When an initialized enclave accesses a page with no backing EPC page > > > + * on a SGX2 system then the EPC can be added dynamically via the SGX2 > > > + * ENCLS[EAUG] instruction. > > > + * > > > + * Returns: Appropriate vm_fault_t: VM_FAULT_NOPAGE when PTE was installed > > > + * successfully, VM_FAULT_SIGBUS or VM_FAULT_OOM as error otherwise. > > > + */ > > > +static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, > > > + struct sgx_encl *encl, unsigned long addr) > > > +{ > > > + struct sgx_pageinfo pginfo = {0}; > > > + struct sgx_encl_page *encl_page; > > > + struct sgx_epc_page *epc_page; > > > + struct sgx_va_page *va_page; > > > + unsigned long phys_addr; > > > + u64 secinfo_flags; > > > + vm_fault_t vmret; > > > + int ret; > > > + > > > + if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) > > > + return VM_FAULT_SIGBUS; > > > + > > > + /* > > > + * Ignore internal permission checking for dynamically added pages. > > > + * They matter only for data added during the pre-initialization > > > + * phase. The enclave decides the permissions by the means of > > > + * EACCEPT, EACCEPTCOPY and EMODPE. > > > + */ > > > + secinfo_flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_X; > > > + encl_page = sgx_encl_page_alloc(encl, addr - encl->base, secinfo_flags); > > > + if (IS_ERR(encl_page)) > > > + return VM_FAULT_OOM; > > > + > > > + epc_page = sgx_alloc_epc_page(encl_page, true); > > > + if (IS_ERR(epc_page)) { > > > + kfree(encl_page); > > > + return VM_FAULT_SIGBUS; > > > + } > > > + > > > + va_page = sgx_encl_grow(encl); > > > + if (IS_ERR(va_page)) { > > > + ret = PTR_ERR(va_page); > > > + goto err_out_free; > > > + } > > > + > > > + mutex_lock(&encl->lock); > > > + > > > + /* > > > + * Copy comment from sgx_encl_add_page() to maintain guidance in > > > + * this similar flow: > > > + * Adding to encl->va_pages must be done under encl->lock. Ditto for > > > + * deleting (via sgx_encl_shrink()) in the error path. > > > + */ > > > + if (va_page) > > > + list_add(&va_page->list, &encl->va_pages); > > > + > > > + ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc), > > > + encl_page, GFP_KERNEL); > > > + /* > > > + * If ret == -EBUSY then page was created in another flow while > > > + * running without encl->lock > > > + */ > > > + if (ret) > > > + goto err_out_unlock; > > > + > > > + pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page); > > > + pginfo.addr = encl_page->desc & PAGE_MASK; > > > + pginfo.metadata = 0; > > > + > > > + ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page)); > > > + if (ret) > > > + goto err_out; > > > + > > > + encl_page->encl = encl; > > > + encl_page->epc_page = epc_page; > > > + encl_page->type = SGX_PAGE_TYPE_REG; > > > + encl->secs_child_cnt++; > > > + > > > + sgx_mark_page_reclaimable(encl_page->epc_page); > > > + > > > + phys_addr = sgx_get_epc_phys_addr(epc_page); > > > + /* > > > + * Do not undo everything when creating PTE entry fails - next #PF > > > + * would find page ready for a PTE. > > > + */ > > > + vmret = vmf_insert_pfn(vma, addr, PFN_DOWN(phys_addr)); > > > + if (vmret != VM_FAULT_NOPAGE) { > > > + mutex_unlock(&encl->lock); > > > + return VM_FAULT_SIGBUS; > > > + } > > > + mutex_unlock(&encl->lock); > > > + return VM_FAULT_NOPAGE; > > > + > > > +err_out: > > > + xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc)); > > > + > > > +err_out_unlock: > > > + sgx_encl_shrink(encl, va_page); > > > + mutex_unlock(&encl->lock); > > > + > > > +err_out_free: > > > + sgx_encl_free_epc_page(epc_page); > > > + kfree(encl_page); > > > + > > > + return VM_FAULT_SIGBUS; > > > +} > > > + > > > static vm_fault_t sgx_vma_fault(struct vm_fault *vmf) > > > { > > > unsigned long addr = (unsigned long)vmf->address; > > > @@ -213,6 +326,17 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf) > > > if (unlikely(!encl)) > > > return VM_FAULT_SIGBUS; > > > > > > + /* > > > + * The page_array keeps track of all enclave pages, whether they > > > + * are swapped out or not. If there is no entry for this page and > > > + * the system supports SGX2 then it is possible to dynamically add > > > + * a new enclave page. This is only possible for an initialized > > > + * enclave that will be checked for right away. > > > + */ > > > + if (cpu_feature_enabled(X86_FEATURE_SGX2) && > > > + (!xa_load(&encl->page_array, PFN_DOWN(addr)))) > > > + return sgx_encl_eaug_page(vma, encl, addr); > > > + > > > mutex_lock(&encl->lock); > > > > > > entry = sgx_encl_load_page_in_vma(encl, addr, vma->vm_flags); > > > > Reviewed-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx> > > Tested-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx> For what is worth I also get a full pass with our test suite, where the runtime is using EAUG together with EACCEPTCOPY: Finished test [unoptimized + debuginfo] target(s) in 0.26s Running unittests src/main.rs (target/debug/deps/enarx-ee7f422740eab404) running 7 tests test backend::sgx::attestation::tests::request_target_info ... ignored test backend::sev::snp::tests::test_const_id_macro ... ok test backend::sev::snp::firmware::test::test_vcek_url ... ok test backend::sgx::ioctls::tests::restrict_permissions ... ok test cli::snp::tests::test_empty_cache_path ... ok test workldr::wasmldr::test::is_builtin ... ok test cli::snp::tests::test_get_or_write ... ok test result: ok. 6 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.20s Running tests/c_integration_tests.rs (target/debug/deps/c_integration_tests-f7a69c2274f59f90) running 21 tests test get_att ... ignored test bind ... ok test clock_gettime ... ok test close ... ok test exit_one ... ok test getegid ... ok test geteuid ... ok test sgx_get_att_quote ... ignored test sgx_get_att_quote_size ... ignored test exit_zero ... ok test getgid ... ok test write_emsgsize ... ignored test write_stderr ... ignored test getuid ... ok test listen ... ok test read ... ok test read_udp ... ok test readv ... ok test socket ... ok test uname ... ok test write_stdout ... ok test result: ok. 16 passed; 0 failed; 5 ignored; 0 measured; 0 filtered out; finished in 18.46s Running tests/rust_integration_tests.rs (target/debug/deps/rust_integration_tests-0122fb231e20ea63) running 6 tests test rust_sev_attestation ... ignored test echo ... ok test cpuid ... ok test memory_stress_test ... ok test memspike ... ok test unix_echo ... ok test result: ok. 5 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 48.22s Running tests/wasmldr_tests.rs (target/debug/deps/wasmldr_tests-98b6ff656b9d815e) running 9 tests test check_tcp ... ok test hello_wasi_snapshot1 ... ok test memspike ... ok test echo has been running for over 60 seconds test memory_stress_test has been running for over 60 seconds test no_export has been running for over 60 seconds test return_1 has been running for over 60 seconds test wasi_snapshot1 has been running for over 60 seconds test memory_stress_test ... ok Error: default export in '' is not a function test no_export ... ok test return_1 ... ok test wasi_snapshot1 ... ok test zerooneone ... ok test echo ... ok test result: ok. 9 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 102.75s BR, Jarkko