On Mon, Jan 24, 2022 at 07:42:08PM +0200, Jarkko Sakkinen wrote: > On Sat, Jan 22, 2022 at 01:15:01PM -0600, Haitao Huang wrote: > > Hi Dave, > > > > On Fri, 21 Jan 2022 13:57:50 -0600, Dave Hansen <dave.hansen@xxxxxxxxx> > > wrote: > > > > > Hi Everyone, > > > > > > There are a few SGX fixes that have showed up in the last week or so, > > > mostly around RAS and fixing the backing storage issues. Could folks > > > please give this branch a good thrashing? > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/log/?h=x86/sgx > > > > > > I'm planning to send this bunch up to Linus after 5.17-rc1 comes out. > > > > > > Kristen, I really dug into the changelogs of your two patches to make it > > > more clear that they are bugfix and stable@ material. I'd appreciate > > > some additional eyeballs there. > > > > When testing this with a large enclave loaded by Intel runtime, we saw it > > hang > > on EADD ioctl and the ioctl never returns. > > > > Also noticed the selftest fails on oversub but the return errno is EINTR not > > ENOMEM as expected. > > > > When user space register signal handler with SA_RESTART flag, EINTR would > > be an auto restart of ioctl. So that might be what's going on with Intel > > runtime test. And I suspect following code may cause infinite restart of the > > ioctl: > > > > struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) > > { > > struct sgx_epc_page *page; > > > > for ( ; ; ) { > > page = __sgx_alloc_epc_page(); > > if (!IS_ERR(page)) { > > page->owner = owner; > > break; > > } > > > > if (list_empty(&sgx_active_page_list))<-- should also check > > sgx_nr_available_backing_pages? > > return ERR_PTR(-ENOMEM); I don't think so but why do you think it should? > > > > if (!reclaim) { > > page = ERR_PTR(-EBUSY); > > break; > > } > > > > if (signal_pending(current)) { > > page = ERR_PTR(-ERESTARTSYS);<---- this is the only exit of > > the for loop when backing store limit reaches. > > break; > > } > > > > sgx_reclaim_pages(); <---- no checking for backing store fail due > > to limit here. Initially I'd think you're making here the right conclusion. Given the limit sgx_reclaim_pages() should return e.g. boolean or int to tell that we are out of backing storage. Kristen? > > cond_resched(); > > } > > > > Thanks > > Haitao The reason was that it was not x86/sgx branch of tip but this: https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/log/?h=x86/sgx Sorry for initial reject. BR, Jarkko