On Wed, Sep 26, 2018 at 01:16:59PM -0700, Dave Hansen wrote: > On 09/26/2018 11:12 AM, Andy Lutomirski wrote: > >> e omniscient. > >> > >> How about this? With formatting changes since it's long-winded... > >> > >> /* > >> * Access is blocked by the Enclave Page Cache Map (EPCM), i.e. the > >> * access is allowed by the PTE but not the EPCM. This usually happens > >> * when the EPCM is yanked out from under us, e.g. by hardware after a > >> * suspend/resume cycle. In any case, software, i.e. the kernel, can't > >> * fix the source of the fault as the EPCM can't be directly modified > >> * by software. Handle the fault as an access error in order to signal > >> * userspace, e.g. so that userspace can rebuild their enclave(s), even > >> * though userspace may not have actually violated access permissions. > >> */ > >> > > Looks good to me. > > Including the actual architectural definition of the bit might add some > clarity. The SDM explicitly says (Vol 3a section 4.7): > > The fault resulted from violation of SGX-specific access-control > requirements. > > Which totally squares with returning true from access_error(). > > There's also a tidbit that says: > > This flag is 1 if the exception is unrelated to paging and > resulted from violation of SGX-specific access-control > requirements. ... such a violation can occur only if there > is no ordinary page fault... > > This is pretty important. It means that *none* of the other > paging-related stuff that we're doing applies. > > We also need to clarify how this can happen. Is it through something > than an app does, or is it solely when the hardware does something under > the covers, like suspend/resume. Are you looking for something in the changelog, the comment, or just a response? If it's the latter... On bare metal with a bug-free kernel, the only scenario I'm aware of where we'll encounter these faults is when hardware pulls the rug out from under us. In a virtualized environment all bets are off because the architecture allows VMMs to silently "destroy" the EPC at will, e.g. KVM, and I believe Hyper-V, will take advantage of this behavior to support live migration. Post migration, the destination system will generate PF_SGX because the EPC{M} can't be migrated between system, i.e. the destination EPCM sees all EPC pages as invalid.