On Sun, Mar 17, 2019 at 11:14:45PM +0200, Jarkko Sakkinen wrote: > Intel Software Guard eXtensions (SGX) is a set of CPU instructions that > can be used by applications to set aside private regions of code and > data. The code outside the enclave is disallowed to access the memory > inside the enclave by the CPU access control. > > This commit adds the Linux SGX Enclave Driver that provides an ioctl API > to manage enclaves. The address range for an enclave, commonly referred > as ELRANGE in the documentation (e.g. Intel SDM), is reserved with > mmap() against /dev/sgx. After that a set ioctls is used to build > the enclave to the ELRANGE. ... > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c > new file mode 100644 > index 000000000000..bd8bcd748976 > --- /dev/null > +++ b/arch/x86/kernel/cpu/sgx/encl.c ... > +/** > + * sgx_encl_next_mm() - Iterate to the next mm > + * @encl: an enclave > + * @mm: an mm list entry > + * @iter: iterator status > + * > + * Return: the enclave mm or NULL > + */ > +struct sgx_encl_mm *sgx_encl_next_mm(struct sgx_encl *encl, > + struct sgx_encl_mm *mm, int *iter) > +{ > + struct list_head *entry; > + > + WARN(!encl, "%s: encl is NULL", __func__); > + WARN(!iter, "%s: iter is NULL", __func__); > + > + spin_lock(&encl->mm_lock); > + > + entry = mm ? mm->list.next : encl->mm_list.next; > + WARN(!entry, "%s: entry is NULL", __func__); > + > + if (entry == &encl->mm_list) { > + mm = NULL; > + *iter = SGX_ENCL_MM_ITER_DONE; > + goto out; > + } > + > + mm = list_entry(entry, struct sgx_encl_mm, list); > + > + if (!kref_get_unless_zero(&mm->refcount)) { > + *iter = SGX_ENCL_MM_ITER_RESTART; > + mm = NULL; > + goto out; > + } > + > + if (!atomic_add_unless(&mm->mm->mm_count, 1, 0)) { This is a use-after-free scenario if mm_count==0. Once the count goes to zero, __mmdrop() begins, at which point this code is racing against free_mm(). What you want here (or rather, in flows where mm != current->mm and you want to access PTEs) is mmget_not_zero(), i.e. "unless zero" on mm_users. mm_count prevents the mm_struct from being freed, but doesn't protect the page tables. mm_users protects the page tables, i.e. lets us safely call sgx_encl_test_and_clear_young in the reclaimer. To ensure liveliness of the mm itself, register an mmu_notifier for each mm_struct (I think in sgx_vma_open()). The enclave's .release callback would then delete the mm from its list and drop its reference (exit_mmap() holds a reference to mm_count so it's safe to do mmdrop() in the .release callback). E.g.: static void sgx_vma_open(struct vm_area_struct *vma) { ... rcu_read_lock(); list_for_each_entry_rcu(...) { if (vma->vm_mm == tmp->mm) { encl_mm = tmp; break; } } rcu_read_unlock(); if (!encl_mm) { mm = kzalloc(sizeof(*mm), GFP_KERNEL); if (!mm) { goto error; encl_mm->encl = encl; encl_mm->mm = vma->vm_mm; if (mmu_notifier_register(&encl->mmu_notifier, encl_mm)) { kfree(encl_mm); goto error; } spin_lock(&encl->mm_lock); list_add(&encl_mm->list, &encl->mm_list); spin_unlock(&encl->mm_lock); } ... error: <not sure what should go here if we don't kill the enclave> } static void sgx_encl_mmu_release(struct mmu_notifier *mn, struct mm_struct *mm) { struct sgx_encl_mm *encl_mm = container_of(mn, struct sgx_encl_mm, mmu_notifier); spin_lock(encl_mm->encl->mm_lock); list_del_rcu(&encl_mm->list); spin_unlock(encl_mm->encl->mm_lock); synchronize_rcu(); mmdrop(mm); } Alternatively, the sgx_encl_mmu_release() could mark the encl_mm as dead instead of removing it from the list, but I don't think that'd mesh well with an RCU list, i.e. we'd need a regular lock-protected list and a custom walker. The only downside with the RCU approach that I can think of is that the encl_mm would stay on the enclave's list until the enclave or the mm itself died. That could result in unnecessary IPIs during reclaim (or invalidation), but that seems like a minor corner case that could be avoided in userspace, e.g. don't mmap() an enclave unless you actually plan on running it. > + kref_put(&mm->refcount, sgx_encl_release_mm); > + mm = NULL; > + *iter = SGX_ENCL_MM_ITER_RESTART; > + goto out; > + } > + > + *iter = SGX_ENCL_MM_ITER_NEXT; > + > +out: > + spin_unlock(&encl->mm_lock); > + return mm; > +} >