On Fri, Jun 21, 2019 at 12:03:36AM +0300, Jarkko Sakkinen wrote: > On Wed, Jun 19, 2019 at 03:23:50PM -0700, Sean Christopherson wrote: > > Using per-vma refcounting to track mm_structs associated with an enclave > > requires hooking .vm_close(), which in turn prevents the mm from merging > > vmas (precisely to allow refcounting). > > Why having sgx_vma_close() prevents that? I do not understand the > problem statement. vmas that define .vm_close() cannot be merged. /* * If the vma has a ->close operation then the driver probably needs to release * per-vma resources, so we don't attempt to merge those. */ static inline int is_mergeable_vma(struct vm_area_struct *vma, struct file *file, unsigned long vm_flags, struct vm_userfaultfd_ctx vm_userfaultfd_ctx) { ... if (vma->vm_ops && vma->vm_ops->close) return 0; if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx)) return 0; return 1; } > > > Avoid refcounting encl_mm altogether by registering an mmu_notifier at > > .mmap(), removing the dying encl_mm at mmu_notifier.release() and > > protecting mm_list during reclaim via a per-enclave SRCU. > > Right, there is the potential collision with my changes: > > 1. Your patch: enclave life-cycle equals life-cycle of all processes > that are associated with the enclave. > 2. My (yet be sent) patch: enclave life-cycle equals the life cycle. > > I won't rush with my patch. I rather merge neither at this point and > you can review mine after you come back from your vacation. > > > Removing refcounting/vm_close() allows merging of enclave vmas, at the > > cost of delaying removal of encl_mm structs from mm_list, i.e. an mm is > > disassociated from an enclave when the mm exits or the enclave dies, as > > opposed to when the last vma (in a given mm) is closed. > > > > The impact of delying encl_mm removal is its memory footprint and > > whatever overhead is incurred during EPC reclaim (to walk an mm's vmas). > > Practically speaking, a stale encl_mm will exist for a meaningful amount > > of time if and only if the enclave is mapped in a long-lived process and > > then passed off to another long-lived process. It is expected that the > > vast majority of use cases will not encounter this condition, e.g. even > > using a daemon to build enclaves should not result in a stale encl_mm as > > the builder should never need to mmap() the enclave. > > This paragraph speaks only about "well behaving" software. Malicious software isn't all that interesting as there are far easier ways to waste system resources. That being said, the encl_mm allocation can use GFP_KERNEL_ACCOUNT. > > Even if there are scenarios that lead to defunct encl_mms, the cost is > > likely far outweighed by the benefits of reducing the number of vmas > > across all enclaves. > > > > Note, using SRCU to protect mm_list is not strictly necessary, i.e. the > > existing walker with encl_mm refcounting could be massaged to work with > > mmu_notifier.release(), but the resulting code is subtle and fragile (I > > never actually got it working). The primary issue is that an encl_mm > > can't be moved off the list until its refcount goes to zero, otherwise > > the custom walker goes off into the weeds. The refcount requirement > > then prevents using mm_list to identify if an mmu_notifier.release() > > has fired, i.e. another mechanism is needed to guard against races > > between exit_mmap() and sgx_release(). > > Is it really impossible to send a separate SRCU patch? I can split out the SRCU as a precursor. It'll likely take me a few days to get it sent. > I fully agree with the SRCU whereas rest of this patch is still > under debate. > > If you could do that, I can merge it in no time. It is a small > step into better direction. > > > Cc: Dave Hansen <dave.hansen@xxxxxxxxx> > > Cc: Andy Lutomirski <luto@xxxxxxxxxx> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > Needs to be rebased because the master missing your earlier bug fix. ... > > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c > > index 9566eb72d417..c6436bbd4a68 100644 > > --- a/arch/x86/kernel/cpu/sgx/encl.c > > +++ b/arch/x86/kernel/cpu/sgx/encl.c > > @@ -132,103 +132,125 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, > > return entry; > > } > > > > -struct sgx_encl_mm *sgx_encl_mm_add(struct sgx_encl *encl, > > - struct mm_struct *mm) > > +static void sgx_encl_mm_release_wq(struct work_struct *work) > > +{ > > + struct sgx_encl_mm *encl_mm = > > + container_of(work, struct sgx_encl_mm, release_work); > > + > > + sgx_encl_mm_release(encl_mm); > > +} > > + > > +/* > > + * Being a call_srcu() callback, this needs to be short, and sgx_encl_release() > > + * is anything but short. Do the final freeing in yet another async callback. > > + */ > > +static void sgx_encl_mm_release_delayed(struct rcu_head *rcu) > > Would rename this either as *_tail() or *_deferred(). Deferred works for me. > > +{ > > + struct sgx_encl_mm *encl_mm = > > + container_of(rcu, struct sgx_encl_mm, rcu); > > + > > + INIT_WORK(&encl_mm->release_work, sgx_encl_mm_release_wq); > > + schedule_work(&encl_mm->release_work); > > +} > > + ... > > @@ -118,11 +123,13 @@ void sgx_encl_destroy(struct sgx_encl *encl); > > void sgx_encl_release(struct kref *ref); > > pgoff_t sgx_encl_get_index(struct sgx_encl *encl, struct sgx_encl_page *page); > > struct page *sgx_encl_get_backing_page(struct sgx_encl *encl, pgoff_t index); > > -struct sgx_encl_mm *sgx_encl_next_mm(struct sgx_encl *encl, > > - struct sgx_encl_mm *encl_mm, int *iter); > > -struct sgx_encl_mm *sgx_encl_mm_add(struct sgx_encl *encl, > > - struct mm_struct *mm); > > -void sgx_encl_mm_release(struct kref *ref); > > +int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm); > > +static inline void sgx_encl_mm_release(struct sgx_encl_mm *encl_mm) > > +{ > > + kref_put(&encl_mm->encl->refcount, sgx_encl_release); > > + > > + kfree(encl_mm); > > +} > > Please just open code this to the two call sites. Makes the code hard to > follow. Heh, I waffled between a helper and open coding. I chose poorly :-) > Right now I did not find anything else questionable from the code > changes. Repeating myself but if it is by any means possible before > going away, can you construct a pure SRCU patch. > > I could then reconstruct my changes on top off that, which would > make evalution of both heck a lot easier. > > /Jarkko