On Tue, 2021-10-12 at 06:57 -0400, Paolo Bonzini wrote: > For bare-metal SGX on real hardware, the hardware provides guarantees > SGX state at reboot. For instance, all pages start out uninitialized. > The vepc driver provides a similar guarantee today for freshly-opened > vepc instances, but guests such as Windows expect all pages to be in > uninitialized state on startup, including after every guest reboot. > > Some userspace implementations of virtual SGX would rather avoid having > to close and reopen the /dev/sgx_vepc file descriptor and re-mmap the > virtual EPC. For example, they could sandbox themselves after the guest > starts and forbid further calls to open(), in order to mitigate exploits > from untrusted guests. > > Therefore, add a ioctl that does this with EREMOVE. Userspace can > invoke the ioctl to bring its vEPC pages back to uninitialized state. > There is a possibility that some pages fail to be removed if they are > SECS pages, and the child and SECS pages could be in separate vEPC > regions. Therefore, the ioctl returns the number of EREMOVE failures, > telling userspace to try the ioctl again after it's done with all > vEPC regions. A more verbose description of the correct usage and > the possible error conditions is documented in sgx.rst. > > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > --- > v1->v2: return EBUSY for SGX_ENCLAVE_ACT, adjust docs > add cond_resched > > Documentation/x86/sgx.rst | 26 +++++++++++++++ > arch/x86/include/asm/sgx.h | 3 ++ > arch/x86/include/uapi/asm/sgx.h | 2 ++ > arch/x86/kernel/cpu/sgx/virt.c | 57 +++++++++++++++++++++++++++++++++ > 4 files changed, 88 insertions(+) > > diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst > index dd0ac96ff9ef..7bc9c3b297d6 100644 > --- a/Documentation/x86/sgx.rst > +++ b/Documentation/x86/sgx.rst > @@ -250,3 +250,29 @@ user wants to deploy SGX applications both on the host and in guests > on the same machine, the user should reserve enough EPC (by taking out > total virtual EPC size of all SGX VMs from the physical EPC size) for > host SGX applications so they can run with acceptable performance. > + > +Architectural behavior is to restore all EPC pages to an uninitialized > +state also after a guest reboot. Because this state can be reached only > +through the privileged ``ENCLS[EREMOVE]`` instruction, ``/dev/sgx_vepc`` > +provides the ``SGX_IOC_VEPC_REMOVE_ALL`` ioctl to execute the instruction > +on all pages in the virtual EPC. > + > +``EREMOVE`` can fail for two reasons, which Linux relays to userspace > +in a different manner: > + > +1. Page removal will always fail when any thread is running in the > + enclave to which the page belongs. In this case the ioctl will > + return ``EBUSY`` independent of whether it has successfully removed > + some pages; userspace can avoid these failures by preventing execution > + of any vcpu which maps the virtual EPC. > + > +2) Page removal will also fail for SGX "SECS" metadata pages which still > + have child pages. Child pages can be removed by executing > + ``SGX_IOC_VEPC_REMOVE_ALL`` on all ``/dev/sgx_vepc`` file descriptors > + mapped into the guest. This means that the ioctl() must be called > + twice: an initial set of calls to remove child pages and a subsequent > + set of calls to remove SECS pages. The second set of calls is only > + required for those mappings that returned a nonzero value from the > + first call. It indicates a bug in the kernel or the userspace client > + if any of the second round of ``SGX_IOC_VEPC_REMOVE_ALL`` calls has > + a return code other than 0. > diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h > index 05f3e21f01a7..2e5d8c97655e 100644 > --- a/arch/x86/include/asm/sgx.h > +++ b/arch/x86/include/asm/sgx.h > @@ -50,6 +50,8 @@ enum sgx_encls_function { > * %SGX_NOT_TRACKED: Previous ETRACK's shootdown sequence has not > * been completed yet. > * %SGX_CHILD_PRESENT SECS has child pages present in the EPC. > + * %SGX_ENCLAVE_ACT One or more logical processors are executing > + * inside the enclave. > * %SGX_INVALID_EINITTOKEN: EINITTOKEN is invalid and enclave signer's > * public key does not match IA32_SGXLEPUBKEYHASH. > * %SGX_UNMASKED_EVENT: An unmasked event, e.g. INTR, was received > @@ -57,6 +59,7 @@ enum sgx_encls_function { > enum sgx_return_code { > SGX_NOT_TRACKED = 11, > SGX_CHILD_PRESENT = 13, > + SGX_ENCLAVE_ACT = 14, > SGX_INVALID_EINITTOKEN = 16, > SGX_UNMASKED_EVENT = 128, > }; > diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h > index 9690d6899ad9..f4b81587e90b 100644 > --- a/arch/x86/include/uapi/asm/sgx.h > +++ b/arch/x86/include/uapi/asm/sgx.h > @@ -27,6 +27,8 @@ enum sgx_page_flags { > _IOW(SGX_MAGIC, 0x02, struct sgx_enclave_init) > #define SGX_IOC_ENCLAVE_PROVISION \ > _IOW(SGX_MAGIC, 0x03, struct sgx_enclave_provision) > +#define SGX_IOC_VEPC_REMOVE_ALL \ > + _IO(SGX_MAGIC, 0x04) > > /** > * struct sgx_enclave_create - parameter structure for the > diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c > index 59cdf3f742ac..81a0a0f22007 100644 > --- a/arch/x86/kernel/cpu/sgx/virt.c > +++ b/arch/x86/kernel/cpu/sgx/virt.c > @@ -150,6 +150,46 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page) > return 0; > } > > +static long sgx_vepc_remove_all(struct sgx_vepc *vepc) > +{ > + struct sgx_epc_page *entry; > + unsigned long index; > + long failures = 0; > + > + xa_for_each(&vepc->page_array, index, entry) { > + int ret = sgx_vepc_remove_page(entry); > + switch (ret) { > + case 0: > + break; > + > + case SGX_CHILD_PRESENT: > + failures++; > + break; > + > + case SGX_ENCLAVE_ACT: > + /* > + * Unlike in sgx_vepc_free_page, userspace could be calling > + * the ioctl while logical processors are running in the > + * enclave; do not warn. > + */ > + return -EBUSY; > + > + default: > + WARN_ONCE(1, EREMOVE_ERROR_MESSAGE, ret, ret); > + failures++; > + break; > + } > + cond_resched(); > + } > + > + /* > + * Return the number of pages that failed to be removed, so > + * userspace knows that there are still SECS pages lying > + * around. > + */ > + return failures; > +} > + > static int sgx_vepc_release(struct inode *inode, struct file *file) > { > struct sgx_vepc *vepc = file->private_data; > @@ -235,9 +274,27 @@ static int sgx_vepc_open(struct inode *inode, struct file *file) > return 0; > } > > +static long sgx_vepc_ioctl(struct file *file, > + unsigned int cmd, unsigned long arg) > +{ > + struct sgx_vepc *vepc = file->private_data; > + > + switch (cmd) { > + case SGX_IOC_VEPC_REMOVE_ALL: > + if (arg) > + return -EINVAL; > + return sgx_vepc_remove_all(vepc); > + > + default: > + return -ENOTTY; > + } > +} > + > static const struct file_operations sgx_vepc_fops = { > .owner = THIS_MODULE, > .open = sgx_vepc_open, > + .unlocked_ioctl = sgx_vepc_ioctl, > + .compat_ioctl = sgx_vepc_ioctl, > .release = sgx_vepc_release, > .mmap = sgx_vepc_mmap, > }; I went through this a few times, the code change is sound and reasoning makes sense in the commit message. The only thing that I think that is IMHO lacking is a simple kselftest. I think a trivial test for SGX_IOC_VEP_REMOVE_ALL would do. /Jarkko