On 10/17/20 10:03 PM, Jarkko Sakkinen wrote: > On Fri, Oct 16, 2020 at 02:25:50PM -0700, Dave Hansen wrote: >>> +/** >>> + * struct sgx_enclave_add_pages - parameter structure for the >>> + * %SGX_IOC_ENCLAVE_ADD_PAGE ioctl >>> + * @src: start address for the page data >>> + * @offset: starting page offset >> >> Is this the offset *within* the page? Might be nice to say that. > > It's the offset in the enclave address range where page is to be added. Yikes, comment improvement needed, badly. >>> +static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl, >>> + unsigned long offset, >>> + u64 secinfo_flags) >>> +{ >>> + struct sgx_encl_page *encl_page; >>> + unsigned long prot; >>> + >>> + encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL); >>> + if (!encl_page) >>> + return ERR_PTR(-ENOMEM); >>> + >>> + encl_page->desc = encl->base + offset; >>> + encl_page->encl = encl; >> >> Somewhere, we need an explanation of why we have 'sgx_epc_page' and >> 'sgx_encl_page'. I think they're 1:1 at least after >> sgx_encl_page_alloc(), so I'm wondering why we need two. > > You need sgx_encl_page to hold data that exists whether or not there is > an associated EPC page. Except they're currently tightly bound: > encl_page = sgx_encl_page_alloc(encl, offset, secinfo->flags); > if (IS_ERR(encl_page)) > return PTR_ERR(encl_page); > > - epc_page = __sgx_alloc_epc_page(); > + epc_page = sgx_alloc_epc_page(encl_page, true); > if (IS_ERR(epc_page)) { > kfree(encl_page); > return PTR_ERR(epc_page); > } So, is this because 'sgx_encl_page' continues to exist even if 'sgx_epc_page' is reclaimed? > Essentially sgx_encl_page contains the data needed for a virtual page, > and sgx_epc_page what is needed to represent physical page. So, grumble grumble, that's horribly inefficient for sparse mappings. There's a reason VMAs cover ranges instead of being allocated per-virtual-page. > None of the data contained in sgx_encl_page make sense for sgx_epc_page. > They don't contain intersecting or redundant data. Yeah, except they point to each other, so if one isn't necessary, we can get rid of that pointer. >>> +static int __sgx_encl_add_page(struct sgx_encl *encl, >>> + struct sgx_encl_page *encl_page, >>> + struct sgx_epc_page *epc_page, >>> + struct sgx_secinfo *secinfo, unsigned long src) >>> +{ >>> + struct sgx_pageinfo pginfo; >>> + struct vm_area_struct *vma; >>> + struct page *src_page; >>> + int ret; >>> + >>> + /* Deny noexec. */ >>> + vma = find_vma(current->mm, src); >>> + if (!vma) >>> + return -EFAULT; >>> + >>> + if (!(vma->vm_flags & VM_MAYEXEC)) >>> + return -EACCES; >>> + >>> + ret = get_user_pages(src, 1, 0, &src_page, NULL); >>> + if (ret < 1) >>> + return -EFAULT; >>> + >>> + pginfo.secs = (unsigned long)sgx_get_epc_addr(encl->secs.epc_page); >>> + pginfo.addr = SGX_ENCL_PAGE_ADDR(encl_page); >>> + pginfo.metadata = (unsigned long)secinfo; >>> + pginfo.contents = (unsigned long)kmap_atomic(src_page); >>> + >>> + ret = __eadd(&pginfo, sgx_get_epc_addr(epc_page)); >> >> Could you convince me that EADD is not going to fault and make the >> kmap_atomic() mad? > > It can legitly fail in the case when power cycle happens. > > That's why the inline assembly catches faults and return an error code. > Thhis code has been field tested a lot. I have fairly good trust on > it. OK, so it can fault, but not *sleep*. Can you comment it to that effect, please? >>> + if (ret) { >>> + if (encls_failed(ret)) >>> + ENCLS_WARN(ret, "EEXTEND"); >>> + return -EIO; >> >> How frequent should we expect these to be? Can users cause them? You >> should *proably* call it ENCLS_WARN_ONCE() if it's implemented that way. > > If power cycle happens. So, we get one warning per power cycle? Practically, do you mean a suspend/resume cycle, or is this more like hibernation-to-disk-resume? In any case, if this is normal system operation (which closing my laptop lid qualifies as), it should produce zero warnings. >>> +static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src, >>> + unsigned long offset, struct sgx_secinfo *secinfo, >>> + unsigned long flags) >>> +{ >>> + struct sgx_encl_page *encl_page; >>> + struct sgx_epc_page *epc_page; >>> + int ret; >>> + >>> + encl_page = sgx_encl_page_alloc(encl, offset, secinfo->flags); >>> + if (IS_ERR(encl_page)) >>> + return PTR_ERR(encl_page); >>> + >>> + epc_page = __sgx_alloc_epc_page(); >>> + if (IS_ERR(epc_page)) { >>> + kfree(encl_page); >>> + return PTR_ERR(epc_page); >>> + } >> >> Looking at these, I'm forgetting why we need to both allocate an >> encl_page and an epc_page. Commends might remind me. So would better >> names. > > Should the struct names be renamed? > > Like sgx_phys_epc_page and sgx_virt_epc_page? "epc" is additional acronym nonsense and redundant with "sgx" and "page" anyway. I'd probably call then 'sgx_phys_page' and 'sgx_virt_slot' or something. >>> + mmap_read_lock(current->mm); >>> + mutex_lock(&encl->lock); >>> + >>> + /* >>> + * Insert prior to EADD in case of OOM. >> >> I wouldn't say OOM. Maybe: >> >> xa_insert() and EADD can both fail. But xa_insert() is easier >> to unwind so do it first. >> >>> EADD modifies MRENCLAVE, i.e. >> >> What is MRENCLAVE? > > The measurement stored in SECS. I'm wondering with xarray, is it > possible to preallocate entry without inserting anything? Let's use plain english here. I don't care what the implementation does, I just care about what it means to the kernel. > Then we could get rid of this unwind and also would not need to > take encl->lock in sgx_encl_may_map(). There was for radix trees, iirc. >>> + * can't be gracefully unwound, while failure on EADD/EXTEND is limited >>> + * to userspace errors (or kernel/hardware bugs). >>> + */ >>> + ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc), >>> + encl_page, GFP_KERNEL); >>> + if (ret) >>> + goto err_out_unlock; >>> + >>> + ret = __sgx_encl_add_page(encl, encl_page, epc_page, secinfo, >>> + src); >>> + if (ret) >>> + goto err_out; >>> + >>> + /* >>> + * Complete the "add" before doing the "extend" so that the "add" >>> + * isn't in a half-baked state in the extremely unlikely scenario >>> + * the enclave will be destroyed in response to EEXTEND failure. >>> + */ >>> + encl_page->encl = encl; >>> + encl_page->epc_page = epc_page; >>> + encl->secs_child_cnt++; >>> + >>> + if (flags & SGX_PAGE_MEASURE) { >>> + ret = __sgx_encl_extend(encl, epc_page); >>> + if (ret) >>> + goto err_out; >>> + } >> >> Why would we never *not* measure an added page? > > You might add Thread Control Structure pages without measuring them or > data area. There are reasons for the user space not to have everything > measured. This is also good comment fodder. >>> +static long sgx_ioc_enclave_add_pages(struct sgx_encl *encl, void __user *arg) >>> +{ >>> + struct sgx_enclave_add_pages addp; >>> + struct sgx_secinfo secinfo; >>> + unsigned long c; >>> + int ret; >>> + >>> + if ((atomic_read(&encl->flags) & SGX_ENCL_INITIALIZED) || >>> + !(atomic_read(&encl->flags) & SGX_ENCL_CREATED)) >>> + return -EINVAL; >> >> There should to be a nice state machine documented somewhere. Is ther? > > So should I document to encl.h where they are declared to start with? I think it's better placed in the Documentation/. >>> + if (copy_from_user(&addp, arg, sizeof(addp))) >>> + return -EFAULT; >>> + >>> + if (!IS_ALIGNED(addp.offset, PAGE_SIZE) || >>> + !IS_ALIGNED(addp.src, PAGE_SIZE)) >>> + return -EINVAL; >>> + >>> + if (!(access_ok(addp.src, PAGE_SIZE))) >>> + return -EFAULT; >> >> This worries me. You're doing an access_ok() check on addp.src because >> you evidently don't trust it. But, below, it looks to be accessed >> directly with an offset, bound by addp.length, which I think can be >>> PAGE_SIZE. >> >> I'd feel a lot better if addp.src's value was being passed around as a >> __user pointer. > > I'm not sure if that call is even needed. Each page is pinned with > get_user_pages(). AFAIK, it should be enough. This must be legacy cruft. get_user_pages() and access_ok() do *very* different things. Even if the pages are pinned, you might still be tricked into referencing off the end of the page, or up into the kernel address space. >>> + if (addp.length & (PAGE_SIZE - 1)) >>> + return -EINVAL; >>> + >>> + if (addp.offset + addp.length - PAGE_SIZE >= encl->size) >>> + return -EINVAL; >>> + >>> + if (copy_from_user(&secinfo, (void __user *)addp.secinfo, >>> + sizeof(secinfo))) >>> + return -EFAULT; >>> + >>> + if (sgx_validate_secinfo(&secinfo)) >>> + return -EINVAL; >>> + >>> + for (c = 0 ; c < addp.length; c += PAGE_SIZE) { >>> + if (signal_pending(current)) { >>> + if (!c) >>> + ret = -ERESTARTSYS; >>> + >>> + break; >>> + } >>> + >>> + if (c == SGX_MAX_ADD_PAGES_LENGTH) >>> + break; >>> + >>> + if (need_resched()) >>> + cond_resched(); >>> + >>> + ret = sgx_encl_add_page(encl, addp.src + c, addp.offset + c, >>> + &secinfo, addp.flags); >> >> Yeah... Don't we need to do another access_ok() check here, if we >> needed one above since we are moving away from addrp.src? > > I don't think so because the page is pinned with get_user_pages(). No, get_user_pages() is orthogonal. Looking at this again, you _might_ be OK since you validated addp.length against encl->size. But, it's all very convoluted and doesn't look very organized or obviously right. So, this begs the question: What, exactly are the guarantees you are expecting out of get_user_pages() here? I also think it's an absolute requirement that if you're passing around userspace pointers that you tag them as __user, not pass around as unsigned longs.