Re: [PATCH v4 3/3] x86/sgx: Resolve EREMOVE page vs EAUG page data race

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 25, 2024 at 01:21:56PM +1200, Huang, Kai wrote:
>
> > Two enclave threads may try to add and remove the same enclave page
> > simultaneously (e.g., if the SGX runtime supports both lazy allocation
> > and MADV_DONTNEED semantics). Consider some enclave page added to the
> > enclave. User space decides to temporarily remove this page (e.g.,
> > emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
> > space performs a memory access on the same page on CPU2, which results
> > in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
> > follows:
> >
> >   [ ... skipped ... ]
> >
> > Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
> > same page. This enclave page becomes perpetually inaccessible (until
> > another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
> > marked accessible in the PTE entry but is not EAUGed, and any subsequent
> > access to this page raises a fault: with the kernel believing there to
> > be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
> > path do_user_addr_fault() -> access_error() causes the SGX driver's
> > sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
> > The userspace SIGSEGV handler cannot perform EACCEPT because the page
> > was not EAUGed. Thus, the user space is stuck with the inaccessible
> > page.
>
> Reading the code, it seems the ioctl(sgx_ioc_enclave_modify_types) also zaps
> EPC mapping when converting a normal page to TSC.  Thus IIUC it should also
> suffer this issue?

Technically yes, sgx_enclave_modify_types() has a similar code path and
can be patched in a similar way.

Practically though, I can't imagine an SGX program or framework to allow a
scenario when CPU1 modifies the type of the enclave page from REG to TCS
and at the same time CPU2 performs a memory access on the same page. This
would be clearly a bug in the SGX program/framework. For example, Gramine
always follows the path of: create a new REG enclave page, modify it to
TCS, only then start using it; i.e., there is never a point in time at
which the REG page is allocated and ready to be converted to a TCS page,
and some other thread/CPU accesses it in-between these steps.

TLDR: I can add similar handling to sgx_enclave_modify_types() if
reviewers insist, but I don't see how this data race can ever be
triggered by benign real-world SGX applications.

--
Dmitrii Kuvaiskii




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux