On Fri, Mar 22, 2019 at 12:19:40PM +0200, Jarkko Sakkinen wrote: > On Thu, Mar 21, 2019 at 08:28:10AM -0700, Sean Christopherson wrote: > > On Thu, Mar 21, 2019 at 04:40:56PM +0200, Jarkko Sakkinen wrote: > > > On Mon, Mar 18, 2019 at 12:50:43PM -0700, Sean Christopherson wrote: > > > > On Sun, Mar 17, 2019 at 11:14:41PM +0200, Jarkko Sakkinen wrote: > > > > Dynamically allocating sgx_epc_sections isn't exactly difficult, and > > > > AFAICT the static allocation is the primary motivation for capping > > > > SGX_MAX_EPC_SECTIONS at such a low value (8). I still think it makes > > > > sense to define SGX_MAX_EPC_SECTIONS so that the section number can > > > > be embedded in the offset, along with flags. But the max can be > > > > significantly higher, e.g. using 7 bits to support 128 sections. > > > > > > > > > > I don't disagree with you but I think for the existing and forseeable > > > hardware this is good enough. Can be refined if there is ever need. > > > > My concern is that there may be virtualization use cases that want to > > expose more than 8 EPC sections to a guest. I have no idea if this is > > anything more than paranoia, but at the same time the cost to increase > > support to 128+ sections is quite low. > > > > > > I realize hardware is highly unlikely to have more than 8 sections, at > > > > least for the near future, but IMO the small amount of extra complexity > > > > is worth having a bit of breathing room. > > > > > > Yup. > > > > > > > > +static __init int sgx_init_epc_section(u64 addr, u64 size, unsigned long index, > > > > > + struct sgx_epc_section *section) > > > > > +{ > > > > > + unsigned long nr_pages = size >> PAGE_SHIFT; > > > > > + struct sgx_epc_page *page; > > > > > + unsigned long i; > > > > > + > > > > > + section->va = memremap(addr, size, MEMREMAP_WB); > > > > > + if (!section->va) > > > > > + return -ENOMEM; > > > > > + > > > > > + section->pa = addr; > > > > > + spin_lock_init(§ion->lock); > > > > > + INIT_LIST_HEAD(§ion->page_list); > > > > > + > > > > > + for (i = 0; i < nr_pages; i++) { > > > > > + page = kzalloc(sizeof(*page), GFP_KERNEL); > > > > > + if (!page) > > > > > + goto out; > > > > > + page->desc = (addr + (i << PAGE_SHIFT)) | index; > > > > > + sgx_section_put_page(section, page); > > > > > + } > > > > > > > > Not sure if this is the correct location, but at some point the kernel > > > > needs to sanitize the EPC during init. EPC pages may be in an unknown > > > > state, e.g. after kexec(), which will cause all manner of faults and > > > > warnings. Maybe the best approach is to sanitize on-demand, e.g. suppress > > > > the first WARN due to unexpected ENCLS failure and purge the EPC at that > > > > time. The downside of that approach is that exposing EPC to a guest would > > > > need to implement its own sanitization flow. > > > > > > Hmm... Lets think this through. I'm just thinking how sanitization on > > > demand would actually work given the parent-child relationships. > > > > It's ugly. > > > > 1. Temporarily disable EPC allocation and enclave fault handling > > 2. Zap all TCS PTEs in all enclaves > > 3. Flush all logical CPUs from enclaves via IPI > > 4. Forcefully reclaim all EPC pages from enclaves > > 5. EREMOVE all "free" EPC pages, track pages that fail with SGX_CHILD_PRESENT > > 6. EREMOVE all EPC pages that failed with SGX_CHILD_PRESENT > > 7. Disable SGX if any EREMOVE failed in step 6 > > 8. Re-enable EPC allocation and enclave fault handling > > > > Exposing EPC to a VM would still require sanitization. > > > > Sanitizing during boot is a lot cleaner, the primary concern is that it > > will significantly increase boot time on systems with large EPCs. If we > > can somehow limit this to kexec() and that's the only scenario where the > > EPC needs to be sanitized, then that would mitigate the boot time concern. > > > > We might also be able to get away with unconditionally sanitizing the EPC > > post-boot, e.g. via worker threads, returning -EBUSY for everything until > > the EPC is good to go. > > I like the worker threads approach better. It is something that is > maintainable. I don't see any better solution given the hierarchical > nature of enclaves. It is also fairly to implement without making > major changes to the other parts of the implementation. > > I.e. every time the driver initializes: > > 1. Move all EPC first to a bad pool. > 2. Let worker threads move EPC to the real allocation pool. > > Then the OS can immediately start to use EPC. > > Is this about along the lines what you had in mind? We could even simplify this by using the already existing reclaimer thread for the purpose. /Jarkko