On Sun, Jun 02, 2019 at 12:29:35AM -0700, Xing, Cedric wrote: > Hi Sean, > > > From: Christopherson, Sean J > > Sent: Friday, May 31, 2019 4:32 PM > > > > This series is the result of a rather absurd amount of discussion over how to get SGX to play > > nice with LSM policies, without having to resort to evil shenanigans or put undue burden on > > userspace. The discussion definitely wandered into completely insane territory at times, but > > I think/hope we ended up with something reasonable. > > > > The basic gist of the approach is to require userspace to declare what protections are > > maximally allowed for any given page, e.g. add a flags field for loading enclave pages that > > takes ALLOW_{READ,WRITE,EXEC}. LSMs can then adjust the allowed protections, e.g. clear > > ALLOW_EXEC to prevent ever mapping the page with PROT_EXEC. SGX enforces the allowed perms > > via a new mprotect() vm_ops hook, e.g. like regular mprotect() uses MAY_{READ,WRITE,EXEC}. > > > > ALLOW_EXEC is used to deny hings like loading an enclave from a noexec file system or from a > > file without EXECUTE permissions, e.g. without the ALLOW_EXEC concept, on SGX2 hardware > > (regardless of kernel support) userspace could EADD from a noexec file using read-only > > permissions, and later use mprotect() and ENCLU[EMODPE] to gain execute permissions. > > > > ALLOW_WRITE is used in conjuction with ALLOW_EXEC to enforce SELinux's EXECMOD (or EXECMEM). > > > > This is very much an RFC series. It's only compile tested, likely has obvious bugs, the > > SELinux patch could be completely harebrained, etc... > > My goal at this point is to get feedback at a macro level, e.g. is the core concept > > viable/acceptable, are there objection to hooking mprotect(), etc... > > > > Andy and Cedric, hopefully this aligns with your general expectations based on our last > > discussion. > > I couldn't understand the real intentions of ALLOW_* flags until I saw them > in code. I have to say C is more expressive than English in that regard :) > > Generally I agree with your direction but think ALLOW_* flags are completely > internal to LSM because they can be both produced and consumed inside an LSM > module. So spilling them into SGX driver and also user mode code makes the > solution ugly and in some cases impractical because not every enclave host > process has a priori knowledge on whether or not an enclave page would be > EMODPE'd at runtime. In this case, the host process should tag *all* pages it *might* convert to executable as ALLOW_EXEC. LSMs can (and should/will) be written in such a way that denying ALLOW_EXEC is fatal to the enclave if and only if the enclave actually attempts mprotect(PROT_EXEC). Take the SELinux path for example. The only scenario in which PROT_WRITE is cleared from @allowed_prot is if the page *starts* with PROT_EXEC. If PROT_EXEC is denied on a page that starts RW, e.g. an EAUG'd page, then PROT_EXEC will be cleared from @allowed_prot. As Stephen pointed out, auditing the denials on @allowed_prot means the log will contain false positives of a sort. But this is more of a noise issue than true false positives. E.g. there are three possible outcomes for the enclave. - The enclave does not do EMODPE[PROT_EXEC] in any scenario, ever. Requesting ALLOW_EXEC is either a straightforward a userspace bug or a poorly written generic enclave loader. - The enclave conditionally performs EMODPE[PROT_EXEC]. In this case the denial is a true false positive. - The enclave does EMODPE[PROT_EXEC] and its host userspace then fails on mprotect(PROT_EXEC), i.e. the LSM denial is working as intended. The audit log will be noisy, but viewed as a whole the denials aren't false positives. The potential for noisy audit logs and/or false positives is unfortunate, but it's (by far) the lesser of many evils. > Theoretically speaking, what you really need is a per page flag (let's name > it WRITTEN?) indicating whether a page has ever been written to (or more > precisely, granted PROT_WRITE), which will be used to decide whether to grant > PROT_EXEC when requested in future. Given the fact that all mprotect() goes > through LSM and mmap() is limited to PROT_NONE, it's easy for LSM to capture > that flag by itself instead of asking user mode code to provide it. > > That said, here is the summary of what I think is a better approach. > * In hook security_file_alloc(), if @file is an enclave, allocate some data > structure to store for every page, the WRITTEN flag as described above. > WRITTEN is cleared initially for all pages. This would effectively require *every* LSM to duplicate the SGX driver's functionality, e.g. track per-page metadata, implement locking to prevent races between multiple mm structs, etc... > Open: Given a file of type struct file *, how to tell if it is an enclave (i.e. /dev/sgx/enclave)? > * In hook security_mmap_file(), if @file is an enclave, make sure @prot can > only be PROT_NONE. This is to force all protection changes to go through > security_file_mprotect(). > * In the newly introduced hook security_enclave_load(), set WRITTEN for pages > that are requested PROT_WRITE. How would an LSM associate a page with a specific enclave? vma->vm_file will point always point at /dev/sgx/enclave. vma->vm_mm is useless because we're allowing multiple processes to map a single enclave, not to mention that by mm would require holding a reference to the mm. > * In hook security_file_mprotect(), if @vma->vm_file is an enclave, look up > and use WRITTEN flags for all pages within @vma, along with other global > flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of SELinux) to decide > on allowing/rejecting @prot. vma->vm_file will always be /dev/sgx/enclave at this point, which means LSMs don't have the necessary anchor back to the source file, e.g. to enforce FILE__EXECUTE. The noexec file system case is also unaddressed. > * In hook security_file_free(), if @file is an enclave, free storage > allocated for WRITTEN flags.