Re: [PATCH v2 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE ioctl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2021-10-12 at 06:57 -0400, Paolo Bonzini wrote:
> For bare-metal SGX on real hardware, the hardware provides guarantees
> SGX state at reboot.  For instance, all pages start out uninitialized.
> The vepc driver provides a similar guarantee today for freshly-opened
> vepc instances, but guests such as Windows expect all pages to be in
> uninitialized state on startup, including after every guest reboot.
> 
> Some userspace implementations of virtual SGX would rather avoid having
> to close and reopen the /dev/sgx_vepc file descriptor and re-mmap the
> virtual EPC.  For example, they could sandbox themselves after the guest
> starts and forbid further calls to open(), in order to mitigate exploits
> from untrusted guests.
> 
> Therefore, add a ioctl that does this with EREMOVE.  Userspace can
> invoke the ioctl to bring its vEPC pages back to uninitialized state.
> There is a possibility that some pages fail to be removed if they are
> SECS pages, and the child and SECS pages could be in separate vEPC
> regions.  Therefore, the ioctl returns the number of EREMOVE failures,
> telling userspace to try the ioctl again after it's done with all
> vEPC regions.  A more verbose description of the correct usage and
> the possible error conditions is documented in sgx.rst.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---
>         v1->v2: return EBUSY for SGX_ENCLAVE_ACT, adjust docs
>                 add cond_resched
> 
>  Documentation/x86/sgx.rst       | 26 +++++++++++++++
>  arch/x86/include/asm/sgx.h      |  3 ++
>  arch/x86/include/uapi/asm/sgx.h |  2 ++
>  arch/x86/kernel/cpu/sgx/virt.c  | 57 +++++++++++++++++++++++++++++++++
>  4 files changed, 88 insertions(+)
> 
> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
> index dd0ac96ff9ef..7bc9c3b297d6 100644
> --- a/Documentation/x86/sgx.rst
> +++ b/Documentation/x86/sgx.rst
> @@ -250,3 +250,29 @@ user wants to deploy SGX applications both on the host and in guests
>  on the same machine, the user should reserve enough EPC (by taking out
>  total virtual EPC size of all SGX VMs from the physical EPC size) for
>  host SGX applications so they can run with acceptable performance.
> +
> +Architectural behavior is to restore all EPC pages to an uninitialized
> +state also after a guest reboot.  Because this state can be reached only
> +through the privileged ``ENCLS[EREMOVE]`` instruction, ``/dev/sgx_vepc``
> +provides the ``SGX_IOC_VEPC_REMOVE_ALL`` ioctl to execute the instruction
> +on all pages in the virtual EPC.
> +
> +``EREMOVE`` can fail for two reasons, which Linux relays to userspace
> +in a different manner:
> +
> +1. Page removal will always fail when any thread is running in the
> +   enclave to which the page belongs.  In this case the ioctl will
> +   return ``EBUSY`` independent of whether it has successfully removed
> +   some pages; userspace can avoid these failures by preventing execution
> +   of any vcpu which maps the virtual EPC.
> +
> +2) Page removal will also fail for SGX "SECS" metadata pages which still
> +   have child pages.  Child pages can be removed by executing
> +   ``SGX_IOC_VEPC_REMOVE_ALL`` on all ``/dev/sgx_vepc`` file descriptors
> +   mapped into the guest.  This means that the ioctl() must be called
> +   twice: an initial set of calls to remove child pages and a subsequent
> +   set of calls to remove SECS pages.  The second set of calls is only
> +   required for those mappings that returned a nonzero value from the
> +   first call.  It indicates a bug in the kernel or the userspace client
> +   if any of the second round of ``SGX_IOC_VEPC_REMOVE_ALL`` calls has
> +   a return code other than 0.
> diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
> index 05f3e21f01a7..2e5d8c97655e 100644
> --- a/arch/x86/include/asm/sgx.h
> +++ b/arch/x86/include/asm/sgx.h
> @@ -50,6 +50,8 @@ enum sgx_encls_function {
>   * %SGX_NOT_TRACKED:           Previous ETRACK's shootdown sequence has not
>   *                             been completed yet.
>   * %SGX_CHILD_PRESENT          SECS has child pages present in the EPC.
> + * %SGX_ENCLAVE_ACT            One or more logical processors are executing
> + *                             inside the enclave.
>   * %SGX_INVALID_EINITTOKEN:    EINITTOKEN is invalid and enclave signer's
>   *                             public key does not match IA32_SGXLEPUBKEYHASH.
>   * %SGX_UNMASKED_EVENT:                An unmasked event, e.g. INTR, was received
> @@ -57,6 +59,7 @@ enum sgx_encls_function {
>  enum sgx_return_code {
>         SGX_NOT_TRACKED                 = 11,
>         SGX_CHILD_PRESENT               = 13,
> +       SGX_ENCLAVE_ACT                 = 14,
>         SGX_INVALID_EINITTOKEN          = 16,
>         SGX_UNMASKED_EVENT              = 128,
>  };
> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
> index 9690d6899ad9..f4b81587e90b 100644
> --- a/arch/x86/include/uapi/asm/sgx.h
> +++ b/arch/x86/include/uapi/asm/sgx.h
> @@ -27,6 +27,8 @@ enum sgx_page_flags {
>         _IOW(SGX_MAGIC, 0x02, struct sgx_enclave_init)
>  #define SGX_IOC_ENCLAVE_PROVISION \
>         _IOW(SGX_MAGIC, 0x03, struct sgx_enclave_provision)
> +#define SGX_IOC_VEPC_REMOVE_ALL \
> +       _IO(SGX_MAGIC, 0x04)
>  
>  /**
>   * struct sgx_enclave_create - parameter structure for the
> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> index 59cdf3f742ac..81a0a0f22007 100644
> --- a/arch/x86/kernel/cpu/sgx/virt.c
> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> @@ -150,6 +150,46 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
>         return 0;
>  }
>  
> +static long sgx_vepc_remove_all(struct sgx_vepc *vepc)
> +{
> +       struct sgx_epc_page *entry;
> +       unsigned long index;
> +       long failures = 0;
> +
> +       xa_for_each(&vepc->page_array, index, entry) {
> +               int ret = sgx_vepc_remove_page(entry);
> +               switch (ret) {
> +               case 0:
> +                       break;
> +
> +               case SGX_CHILD_PRESENT:
> +                       failures++;
> +                       break;
> +
> +               case SGX_ENCLAVE_ACT:
> +                       /*
> +                        * Unlike in sgx_vepc_free_page, userspace could be calling
> +                        * the ioctl while logical processors are running in the
> +                        * enclave; do not warn.
> +                        */
> +                       return -EBUSY;
> +
> +               default:
> +                       WARN_ONCE(1, EREMOVE_ERROR_MESSAGE, ret, ret);
> +                       failures++;
> +                       break;
> +               }
> +               cond_resched();
> +       }
> +
> +       /*
> +        * Return the number of pages that failed to be removed, so
> +        * userspace knows that there are still SECS pages lying
> +        * around.
> +        */
> +       return failures;
> +}
> +
>  static int sgx_vepc_release(struct inode *inode, struct file *file)
>  {
>         struct sgx_vepc *vepc = file->private_data;
> @@ -235,9 +274,27 @@ static int sgx_vepc_open(struct inode *inode, struct file *file)
>         return 0;
>  }
>  
> +static long sgx_vepc_ioctl(struct file *file,
> +                          unsigned int cmd, unsigned long arg)
> +{
> +       struct sgx_vepc *vepc = file->private_data;
> +
> +       switch (cmd) {
> +       case SGX_IOC_VEPC_REMOVE_ALL:
> +               if (arg)
> +                       return -EINVAL;
> +               return sgx_vepc_remove_all(vepc);
> +
> +       default:
> +               return -ENOTTY;
> +       }
> +}
> +
>  static const struct file_operations sgx_vepc_fops = {
>         .owner          = THIS_MODULE,
>         .open           = sgx_vepc_open,
> +       .unlocked_ioctl = sgx_vepc_ioctl,
> +       .compat_ioctl   = sgx_vepc_ioctl,
>         .release        = sgx_vepc_release,
>         .mmap           = sgx_vepc_mmap,
>  };

I went through this a few times, the code change is sound and
reasoning makes sense in the commit message.

The only thing that I think that is IMHO lacking is a simple
kselftest. I think a trivial test for SGX_IOC_VEP_REMOVE_ALL
would do.

/Jarkko





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux