+Cc Andy On Mon, Oct 14, 2019 at 08:43:09AM +0000, Jethro Beekman wrote: > On 2019-10-11 20:15, Sean Christopherson wrote: > > On Fri, Oct 11, 2019 at 04:37:25PM +0000, Jethro Beekman wrote: > >> UAPI: > >> > >> This got a whole lot more complex for userspace compared to the out-of-tree > >> driver. > >> > >> 1. Manually needing to mmap a naturally-aligned memory region by allocating > >> too much memory and then unmapping parts is quite annoying. Why was the > >> auto-aligning removed? I think this will need to be handled the same for > >> every consumer of SGX, so I don't see why this is not handled in the kernel. > >> It never seems wrong to align if NULL is passed as the requested address. > >> Alternatively, is there room in the flags for a MAP_ALIGNED bit? > > > > I'm pretty sure everyone agrees it's annoying. The short of it is that > > the SGX driver is the wrong place to do the alignment. The driver could > > key off addr=0, but we don't want to take on that implicit behavior. > > Why not? Because it's a hack. If a MAP_ALIGNED flag is added then SGX is stuck with kludgy code that serves no purpose. And userspace needs to manually align the result if it provides an actual hint. Regardless of whether there are use cases for providing a hint for ELRANGE, having divergent logic is ugly. > > A MAP_ALIGNED flag to have the allocation be naturally aligned is the > > ideal solution. It's definitely something we should pursue, but that can > > and probably should be done in parallel to the SGX series. > > > >> 2. Having to re-open the device for every enclave is also annoying. This > >> means you need a filesystem available throughout the process lifetime. I > >> tried dup, but that doesn't work. Can we make dup work? > > > > The semantics of dup() won't get you what want, as dup() just creates a > > new descriptor pointing at the same file. > > > > An alternative solution that was proposed was to have an ioctl() for > > creating an enclave. But that means using an anonymous inode, which runs > > afoul of SELinux permissions, e.g. every _process_ that runs enclaves > > would require EXECMEM. Linus was quite clear that SGX wouldn't be merged > > if using it required users to degrade existing security. > > It's ok if it's the same inode, it just needs to be a different struct file. > > > I'm open to other ideas. I wasn't aware this was a pain point and file > > stuff isn't exactly my area of expertise, so I haven't put much/any > > thought into alternatives. > > The default permissions for /dev/sgx/enclave are root-only. This means you > want to be able to do the same thing as network servers: initialize some > resources as root, then drop privileges. This used to mean opening /dev/sgx > and keeping the fd around which meant you could launch enclaves at will. With > the new API, this is no longer possible, you can only launch one enclave per > fd. Is there a different type of operation that doesn't just duplicate the fd > but also the struct file? If not, can we add an ioctl for that? My approach to this would be to chown /dev/sgx/enclave so that it's owned by root but accessible to users belonging to an sgx-specific group, e.g. via a udev rule. > There are other scenarios where it's not just the permissions on > /dev/sgx/enclave that are the problem but using the filesystem in general > that is. Maybe you've used seccomp to disable file operations, etc. Andy and Jarkko, thoughts? > >> 3. Needing to mprotect every page with the precise permissions needed after > >> EINIT is really bad. This means I have to remember this data for every page > >> between EADD and EINIT. I don't care about SELinux, I trust the ECPM will do > >> its job for me. Can we make it so that I can protect the whole range at once, > >> or protect the individual pages at EADD time? > > > > You can mprotect() or mmap(..., MAP_FIXED) an enclave range once all > > pages covered by the specified range have been added to the enclave, i.e. > > at EADD. I double checked this with the selftest. Holler if you're > > seeing different behavior. > > I'd swear I tried this flow and I was getting EACCES. But I implemented it > again now, and it works fine. So this is not an issue. I wouldn't be surprised if it was broken at some point. > I also saw mmap(..., MAP_FIXED) being used in the selftest. Is there a reason > to use this over mprotect? To test that SGX handles MAP_FIXED correctly. At one point the driver was ignoring it. > >> VDSO: > >> > >> It is *difficult* to link to weakly link to a symbol in the VDSO. Anyway, I > >> figured it out. > >> > >> 1. What if I don't want to automatically ERESUME after kernel interrupt? > > > > Do EENTER/ERESUME directly instead of going through the vDSO. > > That kind of defeats the point. Of providing the vDSO? My stance on not allowing userspace fine-grained control over asynchronous enclave exists hasn't changed since we last discussed the issue[1]. Preventing userspace from hooking interrupts can help mitigate certain attacks, and it also aligns with normal userspace, which doesn't get to hook interrupts either. [1] https://lkml.kernel.org/r/20181102171251.GE7393@xxxxxxxxxxxxxxx > >> 2. I normally do a sanity check after ENCLU[EENTER] that EAX = EEXIT. The > >> current implementation just clears EAX instead without looking at it. > > > > Hmm, the only reason I can think of for checking EAX would be to support > > userspace mucking with EAX in a #DB/#BP signal handler. At that point, I > > would expect the signal handler to modify RIP as well. Reaching the XOR > > via any other non-EEXIT path would require a kernel bug. > > Or a CPU bug. Eh, it's possible, but odds are very high that a CPU bug in this area would have other side effects, i.e. would be noticed quite quickly even without an explicit check. > > Was there a specific scenario or use case you had in mind? I'm not > > against adding a check, I just don't see what value it would provide. > > Nothing specific. It just seems like a prudent thing to do when messing with > control flow in unexpected ways. Alternatively, just remove the xor, and > change the API of the function so that 3 is the normal return value? Then the > user can decide themselves if they think the check is worth it. Andy, thoughts? We also never bottomed out what should be returned when an exception is reported to the caller[2], i.e. if there's a better option than -EFAULT. [2] https://lkml.kernel.org/r/90D05734-1583-4306-A9A4-18E4A1390F3B@xxxxxxxxxxxxxx