Re: [RFC PATCH v4 2/4] x86/sgx: Implement support for MADV_WILLNEED

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 22, 2023, Kai Huang wrote:
> On Thu, 2023-06-22 at 15:01 -0700, Sean Christopherson wrote:
> > On Mon, Jun 19, 2023, Kai Huang wrote:
> > > On Fri, 2023-06-16 at 15:05 -0700, Sean Christopherson wrote:
> > > As you can see if we don't pass MAP_FIXED, which is the case for mmap() to
> > > reserve ELRANGE, looks there's no difference between mmap(MAP_ANONYMOUS) and
> > > mmap(/dev/sgx_enclave)?
> > 
> > In practice, there's *probably* no significant difference.  Using an anonymous
> > mapping is advantageous because there's no additional overhead, e.g. for locking
> > the file, it can be done in advance of actually opening /dev/sgx_enclave, helps
> > document (in userspace) that the code isn't actually creating an enclave, just
> > finding a naturally aligned area in virtual memory (which isn't SGX specific), etc.
> 
> Sure, and I agree this perhaps is cleaner and simpler for SGX.  But for this
> purpose I think we should just enforce this policy in SGX driver, i.e., we
> should disable mmap(/dev/sgx_enclave) before ECREATE.
> 
> To me at least the SGX driver should be clear whether it supports userspace to
> use mmap(/dev/sgx_enclave) to reserve ELRANGE.  There should be no ambiguity
> here.
> 
> The current driver only disallows mmap(/dev/sgx_enclave) outside of enclave
> range after EINIT:
> 
> int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
>                      unsigned long end, unsigned long vm_flags)
> {
> 	...
> 
>         /* Disallow mapping outside enclave's address range. */
>         if (test_bit(SGX_ENCL_INITIALIZED, &encl->flags) &&
>             (start < encl->base || end > encl->base + encl->size))
>                 return -EACCES;
> }
> 
> which has what value btw?

That code came in after my involvement, see commit 7b013e723a1f ("x86/sgx: Tighten
accessible memory range after enclave initialization").

> > There are definitely differences, e.g. an LSM could restrict access to
> > /dev/sgx_enclave. �That particular one is obviously a "don't care",�
> > 
> 
> I think you meant "RW->RX" requires EXECMOD, etc?

I meant access to /dev/sgx_enclave itself.  What I was pointing out is that,
in theory, an LSM could deny mmap() on /dev/sgx_enclave, but it's a pointless
distinction since in that case userspace can't create SGX enclaves anyways, i.e.
how userspace finds ELRANGE for an enclave it can't create is irrelevant :-)

> Yeah I am not sure whether this matters to reserving ELRANGE.  That part should
> apply to the mmap(/dev/sgx_enclave) after ECREATE.
> 
> > but it's quite
> > difficult to say that mmap(/dev/sgx_enclave) and mmap(NULL) are equivalent due to
> > the amount of code that's involved in handling mmap().
> 
> Sure.  I was talking about only from "allocating size-aligned" part.  Arguably,
> in terms of reserving ELRANGE, userspace only cares about:
> 
>  1) The ELRANGE is big enough to hold the enclave
>  2) The ELRANGE meets alignment requirement
>  3) Usperspace can mmap(/dev/sgx_enclave) and/or mprotect() small areas to get
>     the correct enclave addr within ELRANGE with desired permission
> 
> > 
> > > > > Also, if I am not missing something, the current driver doesn't prevent using
> > > > > mmap(/dev/sgx_enclave, PROT_NONE) to reserve ELANGE.  So having clear
> > > > > documentation is helpful for SGX users to choose how to write their apps.
> > > > > 
> > > > > Go back to the "SGX driver uses MAP_ANONYMOUS semantics for mmap()", I believe
> > > > > this just is "SGX driver requires mmap() after ECREATE/EINIT to use MAP_SHARED |
> > > > > MAP_FIXED and pgoff is ignored".  Or more precisely, pgoff is "not _used_ by SGX
> > > > > driver".
> > > > > 
> > > > > In fact, I think "pgoff is ignored/not used" is technically wrong for enclave.
> > > > 
> > > > Yeah, it's wrong.  It works because, until now apparently, there was never a reason
> > > > a need to care about the file offset since ELRANGE base always provided the necessary
> > > > information.  It wasn't a deliberate design choice, we simply overlooked that detail
> > > > (unless Jarkko was holding back on me all those years ago :-) ).
> > > 
> > > Yeah.  But I am not sure whether there are corner cases that we can have
> > > potential bug around here, since those VMA's are not aligned between the core-mm
> > > and the driver.
> > > 
> > > I haven't thought carefully though.  I guess from core-mm's perspective there
> > > are multi-VMAs mapping to the same/overlapping part of the enclave, while the
> > > SGX driver treats them as mapping to different non-overlapping parts.  Perhaps
> > > it's OK as long as SGX driver doesn't use vma->pgoff to do something???
> > 
> > Ya, exactly.
> > 
> > > > > IMHO we should stop saying SGX driver uses MAP_ANONYMOUS semantics, because the
> > > > > truth is it just takes advantage of MAP_FIXED and carelessly ignores the pgoff
> > > > > due to the nature of SGX w/o considering from core-MM's perspective.
> > > > >   
> > > > > And IMHO there are two ways to fix:
> > > > > 
> > > > > 1) From now on, we ask SGX apps to use the correct pgoff in their
> > > > > mmap(/dev/sgx_enclave).  This shouldn't impact the existing SGX apps because SGX
> > > > > driver doesn't use vma->pgoff anyway.
> > > > 
> > > > Heh, just "asking" won't help.  And breaking userspace, i.e. requiring all apps
> > > > to update, will just get you yelled at :-)
> > > 
> > > We can document properly and assume the new apps will follow.  As I mentioned
> > > above, the old apps, which doesn't/shouldn't have been using fadvice() anyway,
> > > doesn't need to be changed, i.e.,  they should continue to work. :)
> > > 
> > > No?
> > 
> > You can't build an ABI on assumptions.  E.g. even if userspace *intends* to behave,
> > it wouldn't take much to compute the wrong offset (math is hard).
> 
> But I don't think we have an well established ABI now?  Nothing is documented.

Heh, just because the ABI isn't formally documented doesn't mean it doesn't exist.
In fact, the most problematic ABIs are the ones that aren't documented.

> > There is no "supposed" to.  Using mmap(NULL) is purely a suggestion to avoid
> > running into overheads and checks that are ultimately unnecessary.  The only
> > requirement is that userspace provide a compatible chunk for virtual memory,
> > *how* userspace finds that chunk can be done in a number of ways.  mmap(NULL)
> > just happens to be the most convenient one (IMO).
> 
> Even this, I think we should document it IMO, at least whether
> mmap(/dev/sgx_enclave) can be used to reserve ELRANGE.

See below.

> > > To detect whether a VMA has the correct pgoff, we need to somehow mark it in
> > > sgx_mmap(), but currently we don't have facility to do that.  Even we can do,
> > > the core madvice() -> vfs_fadvice() cannot be aware of such VMA either, and
> > > in sgx_fadvice() we may have problem locating the correct VMA to do EAUG
> > > (because the vfs_fadvice()  uses  0 pgoff to calculate the file offset).
> > > 
> > > So, now I think we should just let userspace itself to pass the correct pgoff
> > > when it wants to use fadvice(), which is the option 1) I mentioned above.  The
> > > kernel doesn't/shouldn't need to fix vma->pgoff.
> > > 
> > > In another words, I think:
> > > 
> > >  - For the old apps, doesn't matter, continue to work;
> > >  - For new apps which want to use fadvice() to accelerate EAUG, it should pass�
> > >    the correct pgoff in mmap(/dev/sgx_enclave) even with  MAP_FIXED.
> > > 
> > > And we don't say "SGX driver uses MAP_ANONYMOUS semantics for
> > > mmap(/dev/sgx_enclave) any more".
> > > 
> > > Does this make sense?
> > 
> > Yes, but as above, IMO the kernel needs to enforce that userspace has properly
> > set the offset, otherwise userspace will end up with completely nonsensical
> > behavior if it fails to set the correct offset.  The proposed sgx_fadvise() already
> > takes mmap_lock for read, i.e. is mutually exclusive with sgx_mmap(), so encforcing
> > the requirement should be quite straightforward, e.g.
> > 
> >         mmap_read_lock(current->mm);
> > 
> >         vma = find_vma(current->mm, start);
> 
> The "start" here may already be wrong because before vfs_fadvice() uses VMA's pgoff
> to calculate the start before passing it to sgx_fadvice().

Not my code :-)

> >         if (!vma)
> >                 goto unlock;
> >         if (vma->vm_private_data != encl)
> >                 goto unlock;
> > 	if (encl->has_mismatched_offsets)  <======
> > 		goto unlock;
> 
> Sorry I am a little bit slow, how do you set "has_mismatched_offsests" to true?

During sgx_mmap().  Though there's a wrinkle I initially missed: if the enclave
hasn't gone through ECREATE, encl->base is garbage.  So either sgx_mmap() needs
to assume the !CREATED is creating a mismatched offset, or sgx_encl_create() needs
to iterate over and check all VMAs.

Since there are advantages to usuing mmap(NULL) to find ELRANGE, IMO your best
option is to do the below.  And then that mostly answers the question about
using mmap(/dev/sgx_enclave) to reserve ELRANGE, i.e. if userspace wants to use
fallocate(), then it effectively *must not* use mmap(/dev/sgx_enclave).

diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index 262f5fb18d74..63fb41da35aa 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -94,6 +94,11 @@ static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
        if (ret)
                return ret;
 
+       if (!test_bit(SGX_ENCL_CREATED, &encl->flags) ||
+           vma->vm_pgoff != PFN_DOWN(vma->vm_start - encl->base))
+               encl->has_mismatched_offsests = true;
+
+
        vma->vm_ops = &sgx_vm_ops;
        vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO);
        vma->vm_private_data = encl;





[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux