Hi Stephen, > From: linux-sgx-owner@xxxxxxxxxxxxxxx [mailto:linux-sgx- > owner@xxxxxxxxxxxxxxx] On Behalf Of Stephen Smalley > Sent: Tuesday, June 04, 2019 8:34 AM > > On 6/3/19 2:30 PM, Xing, Cedric wrote: > >> From: Christopherson, Sean J > >> Sent: Monday, June 03, 2019 10:16 AM > >> > >> On Sun, Jun 02, 2019 at 12:29:35AM -0700, Xing, Cedric wrote: > >>> Hi Sean, > >>> > >>> Generally I agree with your direction but think ALLOW_* flags are > >>> completely internal to LSM because they can be both produced and > >>> consumed inside an LSM module. So spilling them into SGX driver and > >>> also user mode code makes the solution ugly and in some cases > >>> impractical because not every enclave host process has a priori > >>> knowledge on whether or not an enclave page would be EMODPE'd at > >> runtime. > >> > >> In this case, the host process should tag *all* pages it *might* > convert > >> to executable as ALLOW_EXEC. LSMs can (and should/will) be written > in > >> such a way that denying ALLOW_EXEC is fatal to the enclave if and > only > >> if the enclave actually attempts mprotect(PROT_EXEC). > > > > What if those pages contain self-modifying code but the host doesn't > know ahead of time? Would it require ALLOW_WRITE|ALLOW_EXEC at EADD? > Then would it prevent those pages to start with PROT_EXEC? > > > > Anyway, my point is that it is unnecessary even if it works. > > > >> > >> Take the SELinux path for example. The only scenario in which > >> PROT_WRITE is cleared from @allowed_prot is if the page *starts* with > >> PROT_EXEC. > >> If PROT_EXEC is denied on a page that starts RW, e.g. an EAUG'd page, > >> then PROT_EXEC will be cleared from @allowed_prot. > >> > >> As Stephen pointed out, auditing the denials on @allowed_prot means > the > >> log will contain false positives of a sort. But this is more of a > noise > >> issue than true false positives. E.g. there are three possible > outcomes > >> for the enclave. > >> > >> - The enclave does not do EMODPE[PROT_EXEC] in any scenario, ever. > >> Requesting ALLOW_EXEC is either a straightforward a userspace > bug or > >> a poorly written generic enclave loader. > >> > >> - The enclave conditionally performs EMODPE[PROT_EXEC]. In this > case > >> the denial is a true false positive. > >> > >> - The enclave does EMODPE[PROT_EXEC] and its host userspace then > fails > >> on mprotect(PROT_EXEC), i.e. the LSM denial is working as > intended. > >> The audit log will be noisy, but viewed as a whole the denials > >> aren't > >> false positives. > > > > What I was talking about was EMODPE[PROT_WRITE] on an RX page. > > > >> > >> The potential for noisy audit logs and/or false positives is > unfortunate, > >> but it's (by far) the lesser of many evils. > >> > >>> Theoretically speaking, what you really need is a per page flag > (let's > >>> name it WRITTEN?) indicating whether a page has ever been written to > >>> (or more precisely, granted PROT_WRITE), which will be used to > decide > >>> whether to grant PROT_EXEC when requested in future. Given the fact > >>> that all mprotect() goes through LSM and mmap() is limited to > >>> PROT_NONE, it's easy for LSM to capture that flag by itself instead > of > >> asking user mode code to provide it. > >>> > >>> That said, here is the summary of what I think is a better approach. > >>> * In hook security_file_alloc(), if @file is an enclave, allocate > some > >> data > >>> structure to store for every page, the WRITTEN flag as described > >> above. > >>> WRITTEN is cleared initially for all pages. > >> > >> This would effectively require *every* LSM to duplicate the SGX > driver's > >> functionality, e.g. track per-page metadata, implement locking to > >> prevent races between multiple mm structs, etc... > > > > Architecturally we shouldn't dictate how LSM makes decisions. ALLOW_* > are no difference than PROCESS__* or FILE__* flags, which are just > artifacts to assist particular LSMs in decision making. They are never > considered part of the LSM interface, even if other LSMs than SELinux > may adopt the same/similar approach. > > > > If code duplication is what you are worrying about, you can put them > in a library, or implement/export them in some new file (maybe > security/enclave.c?) as utility functions. But spilling them into user > mode is what I think is unacceptable. > > > >> > >>> Open: Given a file of type struct file *, how to tell if it is an > >> enclave (i.e. /dev/sgx/enclave)? > >>> * In hook security_mmap_file(), if @file is an enclave, make sure > >> @prot can > >>> only be PROT_NONE. This is to force all protection changes to go > >> through > >>> security_file_mprotect(). > >>> * In the newly introduced hook security_enclave_load(), set WRITTEN > >> for pages > >>> that are requested PROT_WRITE. > >> > >> How would an LSM associate a page with a specific enclave? vma- > >vm_file > >> will point always point at /dev/sgx/enclave. vma->vm_mm is useless > >> because we're allowing multiple processes to map a single enclave, > not > >> to mention that by mm would require holding a reference to the mm. > > > > Each open("/dev/sgx/enclave") syscall creates a *new* instance of > struct file to uniquely identify one enclave instance. What I mean is > @vma->vm_file, not @vma->vm_file->f_path or @vma->vm_file->f_inode. > > > >> > >>> * In hook security_file_mprotect(), if @vma->vm_file is an enclave, > >> look up > >>> and use WRITTEN flags for all pages within @vma, along with other > >> global > >>> flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of SELinux) > >> to decide > >>> on allowing/rejecting @prot. > >> > >> vma->vm_file will always be /dev/sgx/enclave at this point, which > means > >> LSMs don't have the necessary anchor back to the source file, e.g. to > >> enforce FILE__EXECUTE. The noexec file system case is also > unaddressed. > > > > vma->vm_file identifies an enclave instance uniquely. FILE__EXECUTE is > checked by security_enclave_load() using @source_vma->vm_file. Once a > page has been EADD'ed, whether to allow RW->RX depends on .sigstruct > file (more precisely, the file backing SIGSTRUCT), whose FILE__* > attributes could be cached in vma->vm_file->f_security by > security_enclave_init(). > > The RFC series seemed to dispense with the use of the sigstruct file and > just used the source file throughout IIUC. That allowed for reuse of > FILE__* permissions without ambiguity rather than introducing separate > ENCLAVE__* permissions or using /dev/sgx/enclave inode as the target of > all checks. That's right. But that's not the distinction between Sean's patch and my proposal. My point here is, from the perspective of LSM architecture, LSM hooks shall be defined to pass adequate information to allow *all* possible implementations to make reasonable decisions. In particular, all parameters to EADD (i.e. target linear address, SECINFO, source page) could affect (current and future) decisions but Sean's definition of security_enclave_load() passes only the source, which limits the possibility of other implementations. Another point I'm trying to make is, different LSM implementations may need different information at any given decision point, therefore it's *not* possible to always pass "right" information at "right" time. And that's why I think .f_security was added to struct file to allow stateful LSMs. Sean's patch however is trying the opposite, as ALLOW_* flags should otherwise be part of internal state of LSMs, but they are spilled into SGX driver and also userspace. > > Regardless, IIUC, your approach requires that we always check > FILE__EXECMOD, and FILE__EXECUTE up front during security_enclave_load() > irrespective of prot so that we can save the result in the f_security > for later use by the mprotect hook. This may generate many spurious > audit messages for cases where PROT_EXEC will never be requested, and > users will be prone to just always allowing it since they cannot tell > when it was actually needed. Yes and no. For those not following this discussion closely, here's the prototype of security_enclave_load() that I proposed in one of my earlier emails. int security_enclave_load(struct file *enclave_fd, unsigned long linear_address, unsigned long nr_pages, int prot, struct vm_area_struct *source_vma); @enclave_fd identifies the enclave to which new pages are being added. @linear_address/@nr_pages specifies the linear range of pages being added. @prot specifies the initial protection of those newly added pages. It is taken from the vma covering the target range. @source_vma covers the source pages in the case of EADD. An LSM is expected to make sure security_file_mprotect(source_vma, prot, prot) would succeed before checking anything else, unless @source_vma is NULL, indicating pages are being EAUG'ed. In all cases, LSM is expected to "remember" @prot for all those pages to be checked in future security_file_mprotect() invocations. Architecture wise, the idea here is for SGX driver to pass in all information relevant for an LSM's decision. Implementation wise, LSM may allow PROT_EXEC depending on FILE__EXECUTE of the source file (@source_vma->vm_file), or the sigstruct file (will be provided to LSM at security_enclave_init), or /dev/sgx/enclave. It makes most sense to me to use the source file, hence the check would most likely be done here. For future security_file_mprotect(), the source file's FILE__EXECMOD could also be cached here, or it could use sigstruct file's FILE__EXECMOD instead. Given the fact that no EPC pages could be accessed before EINIT, the major purpose of security_enclave_load() is for LSM to cache whatever information deemed appropriate for the pages being EADD'ed (i.e. @source_vma != NULL) so that it has necessary information to make decisions in security_file_mprotect() in future. And in that regard the parameter @prot is unnecessary but I decided to have it here for 2 reasons: 1) Pages may be EADD'ed within an existing VMA (so no upcoming mprotect) so LSM's decision is needed right away and 2) @source_vma may not be able to mprotect(@prot) and in that case it'd be better to fail EADD instead of failing at mprotect() later. > > > > > The noexec case should be addressed in IOC_ADD_PAGES by testing > @source_vma->vm_flags & VM_MAYEXEC. > > > >> > >>> * In hook security_file_free(), if @file is an enclave, free > storage > >>> allocated for WRITTEN flags.