On 6/11/19 6:55 PM, Xing, Cedric wrote:
From: linux-sgx-owner@xxxxxxxxxxxxxxx [mailto:linux-sgx-
owner@xxxxxxxxxxxxxxx] On Behalf Of Stephen Smalley
Sent: Tuesday, June 11, 2019 6:40 AM
+#ifdef CONFIG_INTEL_SGX
+ rc = sgxsec_mprotect(vma, prot);
+ if (rc <= 0)
+ return rc;
Why are you skipping the file_map_prot_check() call when rc == 0?
What would SELinux check if you didn't do so -
FILE__READ|FILE__WRITE|FILE__EXECUTE to /dev/sgx/enclave? Is it a
problem to let SELinux proceed with that check?
We can continue the check. But in practice, all FILE__{READ|WRITE|EXECUTE} are needed for every enclave, then what's the point of checking them? FILE__EXECMOD may be the only flag that has a meaning, but it's kind of redundant because sigstruct file was checked against that already.
I don't believe FILE__EXECMOD will be checked since it is a shared file
mapping. We'll check at least FILE__READ and FILE__WRITE anyway upon
open(), and possibly FILE__EXECUTE upon mmap() unless that is never
PROT_EXEC. We want the policy to accurately reflect the operations of
the system, even when an operation "must" be allowed, and even here this
only needs to be allowed to processes authorized as enclave loaders, not
to all processes.
I don't think there are other examples where we skip a SELinux check
like this. If we were to do so here, we would at least need a comment
explaining that it was intentional and why. The risk would be that
future checking added into file_map_prot_check() would be unwittingly
bypassed for these mappings. A warning there would also be advisable if
we skip it for these mappings.
+static int selinux_enclave_load(struct file *encl, unsigned long addr,
+ unsigned long size, unsigned long prot,
+ struct vm_area_struct *source)
+{
+ if (source) {
+ /**
+ * Adding page from source => EADD request
+ */
+ int rc = selinux_file_mprotect(source, prot, prot);
+ if (rc)
+ return rc;
+
+ if (!(prot & VM_EXEC) &&
+ selinux_file_mprotect(source, VM_EXEC, VM_EXEC))
I wouldn't conflate VM_EXEC with PROT_EXEC even if they happen to be
defined with the same values currently. Elsewhere the kernel appears to
explicitly translate them ala calc_vm_prot_bits().
Thanks! I'd change them to PROT_EXEC in the next version.
Also, this will mean that we will always perform an execute check on all
sources, thereby triggering audit denial messages for any EADD sources
that are only intended to be data. Depending on the source, this could
trigger PROCESS__EXECMEM or FILE__EXECMOD or FILE__EXECUTE. In a world
where users often just run any denials they see through audit2allow,
they'll end up always allowing them all. How can they tell whether it
was needed? It would be preferable if we could only trigger execute
checks when there is some probability that execute will be requested in
the future. Alternatives would be to silence the audit of these
permission checks always via use of _noaudit() interfaces or to silence
audit of these permissions via dontaudit rules in policy, but the latter
would hide all denials of the permission by the process, not just those
triggered from security_enclave_load(). And if we silence them, then we
won't see them even if they were needed.
*_noaudit() is exactly what I wanted. But I couldn't find selinux_file_mprotect_noaudit()/file_has_perm_noaudit(), and I'm reluctant to duplicate code. Any suggestions?
I would have no objection to adding _noaudit() variants of these, either
duplicating code (if sufficiently small/simple) or creating a common
helper with a bool audit flag that gets used for both. But the larger
issue would be to resolve how to ultimately ensure that a denial is
audited later if the denied permission is actually requested and blocked
via sgxsec_mprotect().
+ prot = 0;
+ else {
+ prot = SGX__EXECUTE;
+ if (source->vm_file &&
+ !file_has_perm(current_cred(), source->vm_file,
+ FILE__EXECMOD))
+ prot |= SGX__EXECMOD;
Similarly, this means that we will always perform a FILE__EXECMOD check
on all executable sources, triggering audit denial messages for any EADD
source that is executable but to which EXECMOD is not allowed, and again
the most common pattern will be that users will add EXECMOD to all
executable sources to avoid this.
+ }
+ return sgxsec_eadd(encl, addr, size, prot);
+ } else {
+ /**
+ * Adding page from NULL => EAUG request
+ */
+ return sgxsec_eaug(encl, addr, size, prot);
+ }
+}
+
+static int selinux_enclave_init(struct file *encl,
+ const struct sgx_sigstruct *sigstruct,
+ struct vm_area_struct *vma)
+{
+ int rc = 0;
+
+ if (!vma)
+ rc = -EINVAL;
Is it ever valid to call this hook with a NULL vma? If not, this should
be handled/prevented by the caller. If so, I'd just return -EINVAL
immediately here.
vma shall never be NULL. I'll update it in the next version.
+
+ if (!rc && !(vma->vm_flags & VM_EXEC))
+ rc = selinux_file_mprotect(vma, VM_EXEC, VM_EXEC);
I had thought we were trying to avoid overloading FILE__EXECUTE (or
whatever gets checked here, e.g. could be PROCESS__EXECMEM or
FILE__EXECMOD) on the sigstruct file, since the caller isn't truly
executing code from it.
Agreed. Another problem with FILE__EXECMOD on the sigstruct file is that user code would then be allowed to modify SIGSTRUCT at will, which effectively wipes out the protection provided by FILE__EXECUTE.
I'd define new ENCLAVE__* permissions, including an up-front
ENCLAVE__INIT permission that governs whether the sigstruct file can be
used at all irrespective of memory protections.
Agreed.
Then you can also have ENCLAVE__EXECUTE, ENCLAVE__EXECMEM,
ENCLAVE__EXECMOD for the execute-related checks. Or you can use the
/dev/sgx/enclave inode as the target for the execute checks and just
reuse the file permissions there.
Now we've got 2 options - 1) New ENCLAVE__* flags on sigstruct file or 2) FILE__* on /dev/sgx/enclave. Which one do you think makes more sense?
ENCLAVE__EXECMEM seems to offer finer granularity (than PROCESS__EXECMEM) but I wonder if it'd have any real use in practice.
Defining a separate ENCLAVE__EXECUTE and using it here for the sigstruct
file would avoid any ambiguity with the FILE__EXECUTE check to the
/dev/sgx/enclave inode that might occur upon mmap() or mprotect(). A
separate ENCLAVE__EXECMEM would enable allowing WX within the enclave
while denying it in the host application or vice versa, which could be a
good thing for security, particularly if SGX2 largely ends up always
wanting WX.
+int sgxsec_mprotect(struct vm_area_struct *vma, size_t prot) {
+ struct enclave_sec *esec;
+ int rc;
+
+ if (!vma->vm_file || !(esec = __esec(selinux_file(vma->vm_file))))
{
+ /* Positive return value indicates non-enclave VMA */
+ return 1;
+ }
+
+ down_read(&esec->sem);
+ rc = enclave_mprotect(&esec->regions, vma->vm_start, vma->vm_end,
+prot);
Why is it safe for this to only use down_read()? enclave_mprotect() can
call enclave_prot_set_cb() which modifies the list?
Probably because it was too late at night when I wrote this line:-( Good catch!
I haven't looked at this code closely, but it feels like a lot of SGX-
specific logic embedded into SELinux that will have to be repeated or
reused for every security module. Does SGX not track this state itself?
I can tell you have looked quite closely, and I truly think you for your time!
You are right that there are SGX specific stuff. More precisely, SGX enclaves don't have access to anything except memory, so there are only 3 questions that need to be answered for each enclave page: 1) whether X is allowed; 2) whether W->X is allowed and 3 whether WX is allowed. This proposal tries to cache the answers to those questions upon creation of each enclave page, meaning it involves a) figuring out the answers and b) "remember" them for every page. #b is generic, mostly captured in intel_sgx.c, and could be shared among all LSM modules; while #a is SELinux specific. I could move intel_sgx.c up one level in the directory hierarchy if that's what you'd suggest.
By "SGX", did you mean the SGX subsystem being upstreamed? It doesn’t track that state. In practice, there's no way for SGX to track it because there's no vm_ops->may_mprotect() callback. It doesn't follow the philosophy of Linux either, as mprotect() doesn't track it for regular memory. And it doesn't have a use without LSM, so I believe it makes more sense to track it inside LSM.
Yes, the SGX driver/subsystem. I had the impression from Sean that it
does track this kind of per-page state already in some manner, but
possibly he means it does under a given proposal and not in the current
driver.
Even the #b remembering might end up being SELinux-specific if we also
have to remember the original inputs used to compute the answer so that
we can audit that information when access is denied later upon
mprotect(). At the least we'd need it to save some opaque data and pass
it to a callback into SELinux to perform that auditing.