On Fri, 2021-08-20 at 01:37 +0200, Michael Kerrisk (man-pages) wrote: > Hello Jarkko > > On 8/10/21 11:16 PM, Jarkko Sakkinen wrote: > > Cc: linux-man@xxxxxxxxxxxxxxx > > Cc: linux-sgx@xxxxxxxxxxxxxxx > > Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > > Cc: Reinette Chatre <reinette.chatre@xxxxxxxxx> > > Signed-off-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx> > > --- > > > > v7: > > * Added more meat about the address space and API. > > * Reorganized the text to have focus more on developer to have a big > > picture of kernel provided interfaces. > > v6: > > * Small fixes based on Dave's and Reinette's feedback. > > * Extended the "Permissions" section to cover mmap() > > v5: > > * Taking away hardware concepts and focusing more on the interface. > > v4: > > * Did a heavy edit trying to streamline the story a bit and focus on > > stuff important to the user (e.g. lighten up x86 details). > > v3: > > * Overhaul based on Michael's comments. Most likely needs to be refined > > in various places but this is at least a small step forward for sure. > > v2: > > * Fixed the semantic newlines convention and various style errors etc. > > that were reported by Alenjandro and Michael. > > * SGX was merged to v5. > > > > man7/sgx.7 | 156 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 156 insertions(+) > > create mode 100644 man7/sgx.7 > > > > diff --git a/man7/sgx.7 b/man7/sgx.7 > > new file mode 100644 > > index 000000000..ab5a504fa > > --- /dev/null > > +++ b/man7/sgx.7 > > @@ -0,0 +1,156 @@ > > +.\" Copyright (C) 2021 Intel Corporation > > +.\" > > +.\" %%%LICENSE_START(VERBATIM) > > +.\" Permission is granted to make and distribute verbatim copies of this > > +.\" manual provided the copyright notice and this permission notice are > > +.\" preserved on all copies. > > +.\" > > +.\" Permission is granted to copy and distribute modified versions of this > > +.\" manual under the conditions for verbatim copying, provided that the > > +.\" entire resulting derived work is distributed under the terms of a > > +.\" permission notice identical to this one. > > +.\" > > +.\" Since the Linux kernel and libraries are constantly changing, this > > +.\" manual page may be incorrect or out-of-date. The author(s) assume no > > +.\" responsibility for errors or omissions, or for damages resulting from > > +.\" the use of the information contained herein. The author(s) may not > > +.\" have taken the same level of care in the production of this manual, > > +.\" which is licensed free of charge, as they might when working > > +.\" professionally. > > +.\" > > +.\" Formatted or processed versions of this manual, if unaccompanied by > > +.\" the source, must acknowledge the copyright and authors of this work. > > +.\" %%%LICENSE_END > > +.\" > > +.TH SGX 7 2021\-02\-02 "Linux" "Linux Programmer's Manual" > > +.PP > > +sgx - overview of Software Guard eXtensions > > +.SH SYNOPSIS > > +.EX > > +.B #include <asm/sgx.h> > > +.PP > > +.IB enclave " = open(""/dev/sgx_enclave", " O_RDWR);" > > +.EE > > +.SH DESCRIPTION > > +Intel Software Guard eXtensions (SGX) allow applications to host, > > s/host,/host/ +1 > > +enclaves, > > +protected executable objects in memory. > > +As software entities enclaves are instances of > > s/entities/entities,/ +1 > > +.I /dev/sgx_enclave. > > What does the previous sentence mean? I'm sorry, it's > not very clear. OK, I'll try to explain. Enclaves are blobs of executable code, running inside a CPU enforced "container", which is mapped to the process address space. It has a fixed set of entry points, defined when the enclave is created. An entry point can be passed to ENCLU opcode's subfunction EENTER, which transforms a hardware thread to enclave mode and starts executing inside the enclave. Any memory access from a thread outside the enclave causes a segfault. A process hosting an enclave, needs to handle invalid opcode CPU exceptions that the enclave generates, because many common opcodes, e.g. SYSCALL and RDTSC are not allowed inside enclaves. That's why EENTER is wrapped inside a vDSO. When a CPU exception happens, kernel and the vDSO code delegate, the exception number, error code and address back to the caller. E.g. the caller can then work as a delegate for a syscall. Any opcode that would cause a jump outside the enclave causes a CPU exception. The only way explicit way to leave the enclave is ENCLU opcode subfunction EEXIT, which moves the hardware thread away from the enclave mode, and jumps to the given target address. > > > +.PP > > +SGX can be available only if the kernel was configured and built with the > > s/can be/is/ +1 > > +.B CONFIG_X86_SGX > > +option. > > +You can verify that both the kernel and hardware have SGX enabled by > > +checking that "sgx" appears in the > > +.I flags > > +field in > > +.IR /proc/cpuinfo . > > +.PP > > +SGX must be enabled in BIOS. > > +If SGX appears to be unsupported, > > +ensure that SGX is enabled in the BIOS. > > +If a BIOS presents a choice between > > +.I Enabled > > +and > > +.I Software Enabled > > +modes for SGX, > > +choose > > +.I Enabled. > > +.PP > > +Enclaves are shared objects, meaning that > > +they can be shared with a > > +.BR cmsg (3), > > How do they get shared with cmg()? This is unclear? > Do you mean by passing a file descriptor over a UNIX > domain socket, or something else? A file descriptor over a UDS, yes. > > +and inherited in a fork. > > +.SS Address space > > +The address range for an enclave must be reserved with > > +.BR mmap (2). > > +This must happen before the enclave construction can begin, > > +because the enclave page addresses are fixed during its build time. > > +.PP > > +The CPU requires the size of the enclave to be power of two, > > Must the size also be >= page size? If so, that should be > mentioned. Yes, I changed this to: " The CPU requires the size of the enclave to be power of two, at least size of a one page, and the base address to be naturally aligned with the size. " > > +and the base address to be naturally aligned with the size. > > +An appropriate address range can be found by an anonymous mapping: > > +.PP > > +.EX > > +void *area = mmap(NULL, size * 2, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, > > + -1, 0); > > + > > +/* Find the first address aligned to the size within the range. */ > > +void *base = ((uint64_t)area + size - 1) & ~(size - 1); > > +.EE > > +.PP > > +.SS Ioctls > > +Enclaves are managed with the > > +.BR ioctl (2) > > +interface defined and documented in > > +.IR <asm/sgx.h>: > > +.TP > > +.IB SGX_IOC_ENCLAVE_CREATE > > +Create SGX Enclave Control Structure (SECS) for an enclave. > > +SECS is a hardware defined structure, > > +which contains the properties of an enclave, > > +such as its base address and size. > > Add a sentence: > > [[ > The ioctl argument has the type > .IR "struct\ *sgx_enclave_create" . > ]] > > (I think this sentence helps the reader orient a little, > when reading the header file.) +1 > > +.TP > > +.IB SGX_IOC_ENCLAVE_ADD_PAGES > > +Populate a range of enclave pages with the page data provided by the caller. > > Add a sentence: > > [[ > The ioctl argument has the type > .IR "struct\ *sgx_enclave_add_pages" . > ]] +1 > > +.TP > > +.IB SGX_IOC_ENCLAVE_INIT > > +Tell the CPU the prepare the enclave for use. > > s/the prepare/to prepare/ +1 > > +After a successful initialization, > > +no new pages can be added to the enclave. > > Add a sentence: > > [[ > The ioctl argument has the type > .IR "struct\ *sgx_enclave_init" . > ]] +1 > Just looking in the header file, does SGX_IOC_ENCLAVE_PROVISION > also need to be mentioned? It's a topic of its own. I would focus now on these basic ioctl's, and contribute that part once these are in shape. It's disjoint in the sense that you can create and run enclaves without provisinioning. > > +.PP > > +The details of what these operations actually mean in the hardware level can be > > s/in/at/ +1 > > +found in the Intel Software Developers Manual. > > +.SS vDSO > > +A process can access enclave by entering into its address space through > > s/enclave/an enclave/ +1 > > +a set of entry points, > > +which must be defined during the construction process. > > +This requires a complex sequence of CPU instructions, > > +and kernel assisted exception handling. > > +For these reasons, > > +it is encapsulated into > > +vDSO interface, > > s/vDSO/a vDSO/ +1 > > +provided by > > +.BR vdso_sgx_enter_enclave_t, > > s/,/ ,/ +1 > > +which is located in > > s/located/declared/ ? +1 > > +.IR <asm/sgx.h>. > > +.SS Permissions > > +In order to build an enclave, a process must be able to call > > +.IR mmap (2) > > +with > > +.IR PROT_EXEC > > +set, > > +because like for any other type of executable, > > s/because like for any/because, as with any/ +1 > > +the page table permissions must be set appropriately. > > s/permissions/protections/ +1 > > +Therefore, > > +.I /dev/sgx_enclave > > +must reside in a partition, > > +which is not mounted with > > +.B noexec > > +set in the mount options. > > +.PP > > +During the build process each enclave page is assigned protection bits, > > +as part of > > +.BR ioctl(SGX_IOC_ENCLAVE_ADD_PAGES). > > In the previous sentence, you use "protections". In the following > sentences you use "permissions". Best to be consistent. Let's > use "protections", as per mmap(2). OK, I'll use that consistently from now on. > > +These permissions are also the maximum permissions to which the page can be be mapped. > > +If > > +.BR mmap (2) > > +is called with surpassing permissions, > > s/surpassing/unexpected/ > s/permissions/protections/ Thanks, it's now all about protectinos. > > +it will return > > +.B -EACCES. > > +If > > +.BR ioctl(SGX_IOC_ENCLAVE_ADD_PAGES) > > +is called after > > +.BR mmap (2) > > +with lower permissions, > > +the process will eventually receive a > > I want to check the wording here, since "eventually" > is often a false friend for nonnative speakers. > > In English, "eventually" means: at some (unknown) time in the future, > something WILL happen. In other words, the happening is certain, but > the timing is not. In various European languages, a similar sounding word > in many cases has the sense of "possibly" or "may happen". Which do > you mean to say? It's actually the latter. SIGBUS gets thrown exactly when a thread executing inside an enclave accesses a page, which has lower defined during the build time, than those of the VMA. If that page is never accessed, then the enclave executes gracefully. The idea is to keep an invariant that the protections defined during the creation cannot be crossed. When/If we ever consider hooking into to the various LSM's, the access control decisions need to happen during the build time. That's why need to have this type of invariant already in place. Thank you for the feedback. I marked just "+1" for trivial fixups. /Jarkko