Re: [PATCH v19 12/27] x86/sgx: Enumerate and track EPC sections

Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx> · Fri, 22 Mar 2019 12:50:58 +0200

On Fri, Mar 22, 2019 at 12:19:40PM +0200, Jarkko Sakkinen wrote:
> On Thu, Mar 21, 2019 at 08:28:10AM -0700, Sean Christopherson wrote:
> > On Thu, Mar 21, 2019 at 04:40:56PM +0200, Jarkko Sakkinen wrote:
> > > On Mon, Mar 18, 2019 at 12:50:43PM -0700, Sean Christopherson wrote:
> > > > On Sun, Mar 17, 2019 at 11:14:41PM +0200, Jarkko Sakkinen wrote:
> > > > Dynamically allocating sgx_epc_sections isn't exactly difficult, and
> > > > AFAICT the static allocation is the primary motivation for capping
> > > > SGX_MAX_EPC_SECTIONS at such a low value (8).  I still think it makes
> > > > sense to define SGX_MAX_EPC_SECTIONS so that the section number can
> > > > be embedded in the offset, along with flags.  But the max can be
> > > > significantly higher, e.g. using 7 bits to support 128 sections.
> > > > 
> > > 
> > > I don't disagree with you but I think for the existing and forseeable
> > > hardware this is good enough. Can be refined if there is ever need.
> > 
> > My concern is that there may be virtualization use cases that want to
> > expose more than 8 EPC sections to a guest.  I have no idea if this is
> > anything more than paranoia, but at the same time the cost to increase
> > support to 128+ sections is quite low.
> > 
> > > > I realize hardware is highly unlikely to have more than 8 sections, at
> > > > least for the near future, but IMO the small amount of extra complexity
> > > > is worth having a bit of breathing room.
> > > 
> > > Yup.
> > > 
> > > > > +static __init int sgx_init_epc_section(u64 addr, u64 size, unsigned long index,
> > > > > +				       struct sgx_epc_section *section)
> > > > > +{
> > > > > +	unsigned long nr_pages = size >> PAGE_SHIFT;
> > > > > +	struct sgx_epc_page *page;
> > > > > +	unsigned long i;
> > > > > +
> > > > > +	section->va = memremap(addr, size, MEMREMAP_WB);
> > > > > +	if (!section->va)
> > > > > +		return -ENOMEM;
> > > > > +
> > > > > +	section->pa = addr;
> > > > > +	spin_lock_init(&section->lock);
> > > > > +	INIT_LIST_HEAD(&section->page_list);
> > > > > +
> > > > > +	for (i = 0; i < nr_pages; i++) {
> > > > > +		page = kzalloc(sizeof(*page), GFP_KERNEL);
> > > > > +		if (!page)
> > > > > +			goto out;
> > > > > +		page->desc = (addr + (i << PAGE_SHIFT)) | index;
> > > > > +		sgx_section_put_page(section, page);
> > > > > +	}
> > > > 
> > > > Not sure if this is the correct location, but at some point the kernel
> > > > needs to sanitize the EPC during init.  EPC pages may be in an unknown
> > > > state, e.g. after kexec(), which will cause all manner of faults and
> > > > warnings.  Maybe the best approach is to sanitize on-demand, e.g. suppress
> > > > the first WARN due to unexpected ENCLS failure and purge the EPC at that
> > > > time.  The downside of that approach is that exposing EPC to a guest would
> > > > need to implement its own sanitization flow.
> > > 
> > > Hmm... Lets think this through. I'm just thinking how sanitization on
> > > demand would actually work given the parent-child relationships.
> > 
> > It's ugly.
> > 
> >   1. Temporarily disable EPC allocation and enclave fault handling
> >   2. Zap all TCS PTEs in all enclaves
> >   3. Flush all logical CPUs from enclaves via IPI
> >   4. Forcefully reclaim all EPC pages from enclaves
> >   5. EREMOVE all "free" EPC pages, track pages that fail with SGX_CHILD_PRESENT
> >   6. EREMOVE all EPC pages that failed with SGX_CHILD_PRESENT
> >   7. Disable SGX if any EREMOVE failed in step 6
> >   8. Re-enable EPC allocation and enclave fault handling
> > 
> > Exposing EPC to a VM would still require sanitization.
> > 
> > Sanitizing during boot is a lot cleaner, the primary concern is that it
> > will significantly increase boot time on systems with large EPCs.  If we
> > can somehow limit this to kexec() and that's the only scenario where the
> > EPC needs to be sanitized, then that would mitigate the boot time concern.
> > 
> > We might also be able to get away with unconditionally sanitizing the EPC
> > post-boot, e.g. via worker threads, returning -EBUSY for everything until
> > the EPC is good to go.
> 
> I like the worker threads approach better. It is something that is
> maintainable. I don't see any better solution given the hierarchical
> nature of enclaves. It is also fairly to implement without making
> major changes to the other parts of the implementation.
> 
> I.e. every time the driver initializes:
> 
> 1. Move all EPC first to a bad pool.
> 2. Let worker threads move EPC to the real allocation pool.
> 
> Then the OS can immediately start to use EPC.
> 
> Is this about along the lines what you had in mind?

We could even simplify this by using the already existing reclaimer
thread for the purpose.

/Jarkko