On Sat, Oct 03, 2020 at 04:39:25PM +0200, Greg KH wrote: > On Sat, Oct 03, 2020 at 07:50:46AM +0300, Jarkko Sakkinen wrote: > > Intel Software Guard eXtensions (SGX) is a set of CPU instructions that can > > be used by applications to set aside private regions of code and data. The > > code outside the SGX hosted software entity is prevented from accessing the > > memory inside the enclave by the CPU. We call these entities enclaves. > > > > Add a driver that provides an ioctl API to construct and run enclaves. > > Enclaves are constructed from pages residing in reserved physical memory > > areas. The contents of these pages can only be accessed when they are > > mapped as part of an enclave, by a hardware thread running inside the > > enclave. > > > > The starting state of an enclave consists of a fixed measured set of > > pages that are copied to the EPC during the construction process by > > using the opcode ENCLS leaf functions and Software Enclave Control > > Structure (SECS) that defines the enclave properties. > > > > Enclaves are constructed by using ENCLS leaf functions ECREATE, EADD and > > EINIT. ECREATE initializes SECS, EADD copies pages from system memory to > > the EPC and EINIT checks a given signed measurement and moves the enclave > > into a state ready for execution. > > > > An initialized enclave can only be accessed through special Thread Control > > Structure (TCS) pages by using ENCLU (ring-3 only) leaf EENTER. This leaf > > function converts a thread into enclave mode and continues the execution in > > the offset defined by the TCS provided to EENTER. An enclave is exited > > through syscall, exception, interrupts or by explicitly calling another > > ENCLU leaf EEXIT. > > > > The mmap() permissions are capped by the contained enclave page > > permissions. The mapped areas must also be populated, i.e. each page > > address must contain a page. This logic is implemented in > > sgx_encl_may_map(). > > > > Cc: linux-security-module@xxxxxxxxxxxxxxx > > Cc: linux-mm@xxxxxxxxx > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > > Acked-by: Jethro Beekman <jethro@xxxxxxxxxxxx> > > Tested-by: Jethro Beekman <jethro@xxxxxxxxxxxx> > > Tested-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx> > > Tested-by: Chunyang Hui <sanqian.hcy@xxxxxxxxxx> > > Tested-by: Jordan Hand <jorhand@xxxxxxxxxxxxxxxxxxx> > > Tested-by: Nathaniel McCallum <npmccallum@xxxxxxxxxx> > > Tested-by: Seth Moore <sethmo@xxxxxxxxxx> > > Tested-by: Darren Kenny <darren.kenny@xxxxxxxxxx> > > Reviewed-by: Darren Kenny <darren.kenny@xxxxxxxxxx> > > Co-developed-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > Co-developed-by: Suresh Siddha <suresh.b.siddha@xxxxxxxxx> > > Signed-off-by: Suresh Siddha <suresh.b.siddha@xxxxxxxxx> > > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx> > > --- > > arch/x86/kernel/cpu/sgx/Makefile | 2 + > > arch/x86/kernel/cpu/sgx/driver.c | 173 ++++++++++++++++ > > arch/x86/kernel/cpu/sgx/driver.h | 29 +++ > > arch/x86/kernel/cpu/sgx/encl.c | 331 +++++++++++++++++++++++++++++++ > > arch/x86/kernel/cpu/sgx/encl.h | 85 ++++++++ > > arch/x86/kernel/cpu/sgx/main.c | 11 + > > 6 files changed, 631 insertions(+) > > create mode 100644 arch/x86/kernel/cpu/sgx/driver.c > > create mode 100644 arch/x86/kernel/cpu/sgx/driver.h > > create mode 100644 arch/x86/kernel/cpu/sgx/encl.c > > create mode 100644 arch/x86/kernel/cpu/sgx/encl.h > > > > diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile > > index 79510ce01b3b..3fc451120735 100644 > > --- a/arch/x86/kernel/cpu/sgx/Makefile > > +++ b/arch/x86/kernel/cpu/sgx/Makefile > > @@ -1,2 +1,4 @@ > > obj-y += \ > > + driver.o \ > > + encl.o \ > > main.o > > diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c > > new file mode 100644 > > index 000000000000..f54da5f19c2b > > --- /dev/null > > +++ b/arch/x86/kernel/cpu/sgx/driver.c > > @@ -0,0 +1,173 @@ > > +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) > > You use gpl-only header files in this file, so how in the world can it > be bsd-3 licensed? > > Please get your legal department to agree with this, after you explain > to them how you are mixing gpl2-only code in with this file. I'll do what I already stated that I will do. Should I do something more? > > +// Copyright(c) 2016-18 Intel Corporation. > > Dates are hard to get right :( Will fix. > > > + > > +#include <linux/acpi.h> > > +#include <linux/miscdevice.h> > > +#include <linux/mman.h> > > +#include <linux/security.h> > > +#include <linux/suspend.h> > > +#include <asm/traps.h> > > +#include "driver.h" > > +#include "encl.h" > > + > > +u64 sgx_encl_size_max_32; > > +u64 sgx_encl_size_max_64; > > +u32 sgx_misc_reserved_mask; > > +u64 sgx_attributes_reserved_mask; > > +u64 sgx_xfrm_reserved_mask = ~0x3; > > +u32 sgx_xsave_size_tbl[64]; > > + > > +static int sgx_open(struct inode *inode, struct file *file) > > +{ > > + struct sgx_encl *encl; > > + int ret; > > + > > + encl = kzalloc(sizeof(*encl), GFP_KERNEL); > > + if (!encl) > > + return -ENOMEM; > > + > > + atomic_set(&encl->flags, 0); > > + kref_init(&encl->refcount); > > + xa_init(&encl->page_array); > > + mutex_init(&encl->lock); > > + INIT_LIST_HEAD(&encl->mm_list); > > + spin_lock_init(&encl->mm_lock); > > + > > + ret = init_srcu_struct(&encl->srcu); > > + if (ret) { > > + kfree(encl); > > + return ret; > > + } > > + > > + file->private_data = encl; > > + > > + return 0; > > +} > > + > > +static int sgx_release(struct inode *inode, struct file *file) > > +{ > > + struct sgx_encl *encl = file->private_data; > > + struct sgx_encl_mm *encl_mm; > > + > > + for ( ; ; ) { > > + spin_lock(&encl->mm_lock); > > + > > + if (list_empty(&encl->mm_list)) { > > + encl_mm = NULL; > > + } else { > > + encl_mm = list_first_entry(&encl->mm_list, > > + struct sgx_encl_mm, list); > > + list_del_rcu(&encl_mm->list); > > + } > > + > > + spin_unlock(&encl->mm_lock); > > + > > + /* The list is empty, ready to go. */ > > + if (!encl_mm) > > + break; > > + > > + synchronize_srcu(&encl->srcu); > > + mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm); > > + kfree(encl_mm); > > + } > > + > > + mutex_lock(&encl->lock); > > + atomic_or(SGX_ENCL_DEAD, &encl->flags); > > So you set a flag that this is dead, and then instantly delete it? Why > does that matter? I see you check for this flag elsewhere, but as you > are just about to delete this structure, how can this be an issue? It matters because ksgxswapd (sgx_reclaimer_*) might be processing it. It will use the flag to skip the operations that it would do to a victim page, when the enclave is still alive. > > > + mutex_unlock(&encl->lock); > > + > > + kref_put(&encl->refcount, sgx_encl_release); > > Don't you need to hold the lock across the put? If not, what is > serializing this? > > But an even larger comment, why is this reference count needed at all? > > You never grab it except at init time, and you free it at close time. > Why not rely on the reference counting that the vfs ensures you? Because ksgxswapd needs the alive enclave instance while it is in the process of swapping a victim page. The reason for this is the hierarchical nature of the enclave pages. As an example, a write operation to main memory, EWB (SDM vol 3D 40-79) needs to access SGX Enclave Control Structure (SECS) page, which is contains global data for an enclave, like the unswapped child count. > > + return 0; > > +} > > + > > +static int sgx_mmap(struct file *file, struct vm_area_struct *vma) > > +{ > > + struct sgx_encl *encl = file->private_data; > > + int ret; > > + > > + ret = sgx_encl_may_map(encl, vma->vm_start, vma->vm_end, vma->vm_flags); > > + if (ret) > > + return ret; > > + > > + ret = sgx_encl_mm_add(encl, vma->vm_mm); > > + if (ret) > > + return ret; > > + > > + vma->vm_ops = &sgx_vm_ops; > > + vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO; > > + vma->vm_private_data = encl; > > + > > + return 0; > > +} > > + > > +static unsigned long sgx_get_unmapped_area(struct file *file, > > + unsigned long addr, > > + unsigned long len, > > + unsigned long pgoff, > > + unsigned long flags) > > +{ > > + if ((flags & MAP_TYPE) == MAP_PRIVATE) > > + return -EINVAL; > > + > > + if (flags & MAP_FIXED) > > + return addr; > > + > > + return current->mm->get_unmapped_area(file, addr, len, pgoff, flags); > > +} > > + > > +static const struct file_operations sgx_encl_fops = { > > + .owner = THIS_MODULE, > > + .open = sgx_open, > > + .release = sgx_release, > > + .mmap = sgx_mmap, > > + .get_unmapped_area = sgx_get_unmapped_area, > > +}; > > + > > +static struct miscdevice sgx_dev_enclave = { > > + .minor = MISC_DYNAMIC_MINOR, > > + .name = "enclave", > > + .nodename = "sgx/enclave", > > A subdir for a single device node? Ok, odd, but why not just > "sgx_enclave"? How "special" is this device node? There is a patch that adds "sgx/provision". Either works for me. Should I flatten them to "sgx_enclave" and "sgx_provision", or keep them as they are? > thanks, > > greg k-h /Jarkko