Re: [PATCH 05/25] x86/sgx: Introduce runtime protection bits

Reinette Chatre <reinette.chatre@xxxxxxxxx> · Fri, 3 Dec 2021 14:12:17 -0800

Hi Andy,

On 12/3/2021 11:28 AM, Andy Lutomirski wrote:
On 12/1/21 11:23, Reinette Chatre wrote:
Enclave creators declare their paging permission intent at the time
the pages are added to the enclave. These paging permissions are
vetted when pages are added to the enclave and stashed off
(in sgx_encl_page->vm_max_prot_bits) for later comparison with
enclave PTEs.

I'm a bit confused here. ENCLU[EMODPE] allows the enclave to change the 
EPCM permission bits however it likes with no oversight from the kernel. 
  So we end up with a whole bunch of permission masks:

Before jumping to the permission masks I would like to step back and 
just confirm the context. We need to consider the following three 
permissions:

EPCM permissions: the enclave page permissions maintained in the SGX 
hardware. The OS is constrained here in that it cannot query the current 
EPCM permissions. Even so, the OS needs to ensure PTEs are installed 
appropriately (we do not want a RW PTE for a read-only enclave page) and 
thus the OS keeps its own record of EPCM permissions to support this.
As you note later even in current kernel the enclave can change these 
permissions without OS knowing. EPCM permissions can only be relaxed 
without the OS knowledge though so the OS record of EPCM permissions can 
only ever be stricter than the actual EPCM permissions.

VMA permissions: Current behavior (not changed in this series) is that 
the OS enforces that a new VMA should have the same or weaker 
permissions than the EPCM permissions.

Page table entries: These should match the EPCM permissions without 
exceeding VMA permissions.

The PTE: controlled by complex kernel policy

The VMA: with your series, this is entirely controlled by userspace.  I 
think I'm fine with that.

vm_max_prot_bits: populated from secinfo at setup time, unless I missed 
something that changes it later.  Maybe I'm confused or missed something 
in one of the patches,

Yes, vm_max_prot_bits is currently and continues to be populated from 
secinfo for pages added before the enclave is initialized and in a later 
patch it would be hardcoded to RW for pages that are added after the 
enclave is initialized.  In the current implementation vm_max_prot_bits 
is the OS's record of the EPCM permissions used to guide VMA and PTE 
permissions.

On a higher level, the implementation decision is that vm_max_prot_bits 
is the static "vetted" permissions of a page - the maximum permissions a 
page is allowed to have during its entire lifetime. This matches the 
current implementation. In the current implementation permissions are 
only able to change via VMA and PTE ... for example a read-only VMA can 
access an enclave page with vm_max_prot_bits of RW. With the SGX2 
support permission changes are allowed to EPCM permissions - but in this 
implementation they are not allowed to exceed the originally vetted 
vm_max_prot_bits.

In this SGX2 implementation an enclave page could thus be added to an 
enclave with secinfo and vm_max_prot_bits of RW that would only allow 
that page to have R or RW permissions (VMA, PTE, and OS view of EPCM 
permissions) in its lifetime, never RX or RWX. Yes, it may be possible 
for the enclave to change the EPCM permissions from within the enclave 
using ENCLU[EMODPE] but to access the page the enclave would need the OS 
to install the appropriate PTE and the OS would not do so if 
vm_max_prot_bits does not allow it. Neither would the OS allow an 
executable VMA.

vm_run_prot_bits: populated from some combination of ioctls.  I'm 
entirely lost as to what this is for.

With SGX2 it is possible to change the EPCM permissions of an enclave 
page after the enclave is initialized. vm_max_prot_bits would provide 
guidance to what permissions a page is allowed to have while 
vm_run_prot_bits contains the current view of EPCM permissions used by 
the OS to guide whether requested VMA permissions are allowed and guide 
what PTE permissions should be.

Consider this example how vm_max_prot_bits and vm_run_prot_bits are used:

(1) Add enclave page with secinfo of RW to uninitialized enclave
    vm_max_prot_bits = RW
    vm_run_prot_bits = RW

(2) User space runs SGX_IOC_PAGE_MODP to change the permissions to read-
    only. This is allowed because vm_max_prot_bits = RW. Now:
    vm_max_prot_bits = RW
    vm_run_prot_bits = R

    Now VMAs are created and PTEs installed based on value of
    vm_run_prot_bits - write access will not be allowed.

(3) User space runs SGX_IOC_PAGE_MODP to change the permissions to RX.
    This will be denied because vm_max_prot_bits = RW.

(3) User space runs SGX_IOC_PAGE_MODP to change the permissions to RW.
    This will be allowed because vm_max_prot_bits = RW.

    Now VMAs are created and PTEs installed based on value of
    vm_run_prot_bits - write access will again be allowed.

EPCM bits: controlled by the guest.  basically useless for any host 
purpose on SGX2 hardware (with or without kernel support -- the enclave 
can do ENCLU[EMODPE] whether we like it or not, even on old kernels)

Indeed - permissions can only be relaxed without the OS knowledge so the 
OS's view would always be the same or stricter than the enclave.

So I guess I don't understand the purpose of this patch    or of the 
rules in the later patches, and I feel like this is getting more 
complicated than makes sense.

Could we perhaps make vm_max_prot_bits dynamic or at least controllable 
in some useful way?  My initial proposal (years ago) was for 
vm_max_prot_bits to be *separately* configured at initial load time 
instead of being inferred from secinfo with the intent being that the 
user untrusted runtime would set it appropriately.  I have no problem 
with allowing runtime changes as long as the security policy makes sense 
and it's kept consistent with PTEs.

This SGX2 enabling indeed builds on the current implementation where 
vm_max_prot_bits is inferred from secinfo. The intent is for 
vm_max_prot_bits to reflect the maximum allowed vetted permissions.

At this time vm_max_prot_bits is indeed static and thus creates the need 
for (dynamic) vm_run_prot_bits that reflects the current EPCM 
permissions and guides VMA and PTE permissions while vm_max_prot_bits 
guides new permission requests. From what I understand this 
implementation follows the current security policy - permissions are 
never allowed to exceed the originally vetted permissions. PTEs are kept 
consistent in that they match the (vetted, OS view of) EPCM permissions.

Also, I think we need a changelog message or, even better, actual docs 
in kernel, explaining the actual final set of rules and invariants for 
all these masks.

I will add a section to Documentation/x86/sgx.rst.

Reinette