Re: [RFC PATCH v4 2/4] x86/sgx: Implement support for MADV_WILLNEED

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 19, 2023, Kai Huang wrote:
> On Fri, 2023-06-16 at 15:05 -0700, Sean Christopherson wrote:
> > But I think this is unrelated to what you really care about, e.g. a userspace that
> > tightly controls its virtual memory could hardcode enclave placement (IIRC graphene
> > did/does do that).  I.e. the alignment issue is a completely different discussion.
> > 
> > [*] https://lore.kernel.org/all/20190522153836.GA24833@xxxxxxxxxxxxxxx
> 
> Right.  For the sake of making progress of this Haitao's series, we want to
> understand where did "SGX driver requires mmap(/dev/sgx_enclave) to use
> MAP_ANONYMOUS semantics" come from.
> 
> But for this topic (how to reserve ELRANGE).  I am not sure the reason that we
> prefer mmap(MAP_ANONYMOUS) still stands.  For instance, the discussion in your
> above link was old implementation/assumption -- e.g., MAP_FIXED wasn't even
> used/supported for mmap()ing enclave chunks.
> 
> So I am wondering now whether we suggest user to use mmap(MAP_ANONYMOUS) to get
> a size-aligned address still stands?  The current SGX driver's
> get_unmapped_area:
> 
> static unsigned long sgx_get_unmapped_area(struct file *file,                  
>                                            unsigned long addr,
>                                            unsigned long len,                  
>                                            unsigned long pgoff,                
>                                            unsigned long flags)                
> {       
>         if ((flags & MAP_TYPE) == MAP_PRIVATE)                                 
>                 return -EINVAL;
>         
>         if (flags & MAP_FIXED)
>                 return addr;
>                         
>         return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);  
> }               
>  
> 
> As you can see if we don't pass MAP_FIXED, which is the case for mmap() to
> reserve ELRANGE, looks there's no difference between mmap(MAP_ANONYMOUS) and
> mmap(/dev/sgx_enclave)?

In practice, there's *probably* no significant difference.  Using an anonymous
mapping is advantageous because there's no additional overhead, e.g. for locking
the file, it can be done in advance of actually opening /dev/sgx_enclave, helps
document (in userspace) that the code isn't actually creating an enclave, just
finding a naturally aligned area in virtual memory (which isn't SGX specific), etc.

There are definitely differences, e.g. an LSM could restrict access to
/dev/sgx_enclave.  That particular one is obviously a "don't care", but it's quite
difficult to say that mmap(/dev/sgx_enclave) and mmap(NULL) are equivalent due to
the amount of code that's involved in handling mmap().

> > > Also, if I am not missing something, the current driver doesn't prevent using
> > > mmap(/dev/sgx_enclave, PROT_NONE) to reserve ELANGE.  So having clear
> > > documentation is helpful for SGX users to choose how to write their apps.
> > > 
> > > Go back to the "SGX driver uses MAP_ANONYMOUS semantics for mmap()", I believe
> > > this just is "SGX driver requires mmap() after ECREATE/EINIT to use MAP_SHARED |
> > > MAP_FIXED and pgoff is ignored".  Or more precisely, pgoff is "not _used_ by SGX
> > > driver".
> > > 
> > > In fact, I think "pgoff is ignored/not used" is technically wrong for enclave.
> > 
> > Yeah, it's wrong.  It works because, until now apparently, there was never a reason
> > a need to care about the file offset since ELRANGE base always provided the necessary
> > information.  It wasn't a deliberate design choice, we simply overlooked that detail
> > (unless Jarkko was holding back on me all those years ago :-) ).
> 
> Yeah.  But I am not sure whether there are corner cases that we can have
> potential bug around here, since those VMA's are not aligned between the core-mm
> and the driver.
> 
> I haven't thought carefully though.  I guess from core-mm's perspective there
> are multi-VMAs mapping to the same/overlapping part of the enclave, while the
> SGX driver treats them as mapping to different non-overlapping parts.  Perhaps
> it's OK as long as SGX driver doesn't use vma->pgoff to do something???

Ya, exactly.

> > > IMHO we should stop saying SGX driver uses MAP_ANONYMOUS semantics, because the
> > > truth is it just takes advantage of MAP_FIXED and carelessly ignores the pgoff
> > > due to the nature of SGX w/o considering from core-MM's perspective.
> > >   
> > > And IMHO there are two ways to fix:
> > > 
> > > 1) From now on, we ask SGX apps to use the correct pgoff in their
> > > mmap(/dev/sgx_enclave).  This shouldn't impact the existing SGX apps because SGX
> > > driver doesn't use vma->pgoff anyway.
> > 
> > Heh, just "asking" won't help.  And breaking userspace, i.e. requiring all apps
> > to update, will just get you yelled at :-)
> 
> We can document properly and assume the new apps will follow.  As I mentioned
> above, the old apps, which doesn't/shouldn't have been using fadvice() anyway,
> doesn't need to be changed, i.e.,  they should continue to work. :)
> 
> No?

You can't build an ABI on assumptions.  E.g. even if userspace *intends* to behave,
it wouldn't take much to compute the wrong offset (math is hard).

> > > 2) For the sake of avoiding having to ask existing SGX apps to change their
> > > mmap()s, we _officially_ say that userspace isn't required to pass a correct
> > > pgoff to mmap() (i.e. passing 0 as did in existing apps), but the kernel should
> > > fix the vma->pgoff internally.
> > 
> > I recommend you don't do this.  Overwriting vm_pgoff would probably work, but it's
> > going to make a flawed API even messier.  E.g. you'll have painted SGX into a
> > corner if you ever want to decouple vma->start/end from encl->base.  I highly
> > doubt that will ever happen given how ELRANGE works, but I don't think a hack-a-fix
> > buys you enough to justify any more ugliness.
> 
> I also found it's not feasible to cleanly fix the pgoff from userspace (I
> thought the pgoff could be fixed at very early stage of do_mmap() so everything
> later just worked, but it's not the case).  In do_mmap() the pgoff from
> userspace is already used to VMA merging/splitting etc before creating the
> target VMA.
> 
> Hmm.. Now I think we shouldn't silently fix pgoff in SGX driver as it may
> surprise the core-mm later because the core-mm has already done some job around
> VMAs before vma->pgoff is changed.
> 
> > 
> > > I do prefer option 2) because it has no harm to anyone: 1) No changes to
> > > existing SGX apps; 2) It aligns with the core-MM to so all existing mmap()
> > > operations should work as expected, meaning no surprise; 3) And this patchset
> > > from Haitao to use fadvice() to accelerate EAUG flow just works.
> > 
> > I think you can have your cake and eat it too.  IIUC, the goal is to get fadvise()
> > working, and to do that without creating a truly heinous uAPI, you need an accurate
> > vm_pgoff.  So, use a carrot and stick approach.
> > 
> > If userspace properly defines vm_pgoff during mmap() and doesn't specify MAP_ANONYMOUS,
> > then they get to use fadvise() (give 'em a carrot).  But if *any* mmap() on the
> > enclave doesn't followo those rules, mark the enclave as tainted or whatever and
> > disallow fadvise() (hit 'em with a stick).
> 
> If we are supposed to use mmap(MAP_ANONYMOUS) to reserve ELRANGE, then I think
> userspace will just use MAP_FIXED for all mmap()s to /dev/sgx_enclave.

There is no "supposed" to.  Using mmap(NULL) is purely a suggestion to avoid
running into overheads and checks that are ultimately unnecessary.  The only
requirement is that userspace provide a compatible chunk for virtual memory,
*how* userspace finds that chunk can be done in a number of ways.  mmap(NULL)
just happens to be the most convenient one (IMO).

> To detect whether a VMA has the correct pgoff, we need to somehow mark it in
> sgx_mmap(), but currently we don't have facility to do that.  Even we can do,
> the core madvice() -> vfs_fadvice() cannot be aware of such VMA either, and
> in sgx_fadvice() we may have problem locating the correct VMA to do EAUG
> (because the vfs_fadvice()  uses  0 pgoff to calculate the file offset).
> 
> So, now I think we should just let userspace itself to pass the correct pgoff
> when it wants to use fadvice(), which is the option 1) I mentioned above.  The
> kernel doesn't/shouldn't need to fix vma->pgoff.
> 
> In another words, I think:
> 
>  - For the old apps, doesn't matter, continue to work;
>  - For new apps which want to use fadvice() to accelerate EAUG, it should pass�
>    the correct pgoff in mmap(/dev/sgx_enclave) even with  MAP_FIXED.
> 
> And we don't say "SGX driver uses MAP_ANONYMOUS semantics for
> mmap(/dev/sgx_enclave) any more".
> 
> Does this make sense?

Yes, but as above, IMO the kernel needs to enforce that userspace has properly
set the offset, otherwise userspace will end up with completely nonsensical
behavior if it fails to set the correct offset.  The proposed sgx_fadvise() already
takes mmap_lock for read, i.e. is mutually exclusive with sgx_mmap(), so encforcing
the requirement should be quite straightforward, e.g.

        mmap_read_lock(current->mm);

        vma = find_vma(current->mm, start);
        if (!vma)
                goto unlock;
        if (vma->vm_private_data != encl)
                goto unlock;
	if (encl->has_mismatched_offsets)  <======
		goto unlock;

        pos = start;
        if (pos < vma->vm_start || end > vma->vm_end) {
                /* Don't allow any gaps */
                goto unlock;
        }




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux