Re: [RFC PATCH v4 2/4] x86/sgx: Implement support for MADV_WILLNEED

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2023-06-06 at 04:11 +0000, Huang, Kai wrote:
> On Fri, 2023-05-26 at 19:32 -0500, Haitao Huang wrote:
> > Hi Kai, Jarkko and Dave
> > 
> > On Thu, 09 Mar 2023 05:31:29 -0600, Huang, Kai <kai.huang@xxxxxxxxx> wrote:
> > > 
> > > So I am still a little bit confused about where does "SGX driver uses
> > > MAP_ANONYMOUS semantics for fd-based mmap()" come from.
> > > 
> > > Anyway, we certainly don't want to break userspace.  However, IIUC, even  
> > > from
> > > now on we change the driver to depend on userspace to pass the correct  
> > > pgoff in
> > > mmap(), this won't break userspace, because old userspace which doesn't  
> > > use
> > > fadvice() and pgoff actually doesn't matter.  For new userspace which  
> > > uses
> > > fadvice(), it needs to pass the correct pgoff.
> > > 
> > > I am not saying we should do this, but it doesn't seem we can break  
> > > userspace?
> > > 
> > 
> > Sorry for delayed update but I thought about this more and likely to  
> > propose a new EAUG ioctl for this and for enabling SGX-CET shadow stack  
> > pages. But regardless, I'd like to wrap up this discussion to just clarify  
> > this anonymous semantics design in documentation so people won't get  
> > confused in future.
> > 
> > I think we all agree to keep this semantics so no user space would need  
> > specify 'offset' for mmap with enclave fd. And here is my proposed  
> > documentation changes.
> > 
> > --- a/Documentation/x86/sgx.rst
> > +++ b/Documentation/x86/sgx.rst
> > @@ -100,6 +100,23 @@ pages and establish enclave page permissions.
> >                  sgx_ioc_enclave_init
> >                  sgx_ioc_enclave_provision
> > 
> > +Enclave memory mapping
> > +----------------------
> > +
> > +A file descriptor created from opening **/dev/sgx_enclave** represents an
> > +enclave object. The mmap() syscall with enclave file descriptors does not
> > +support non-zero value for the 'offset' parameter.
> 
> I think we all need to understand better why SGX driver requires anonymous
> semantics mmap() against /dev/sgx_enclave, and as a result of that, requires
> mmap() to pass  0 as pgoff (which looks wasn't even discussed when upstreaming
> the driver).
> 
> I'll do some investigation and try to summerize and report back.  Thanks.
> 

+ Sean.

Hi Sean,

If you see this and have time, please help to comment.  Thanks.

I've spent plenty of time to look into the discussions around v20/v28/v29 and
roughly v38/v39 to find out why SGX driver requires MAP_ANONYMOUS semantics, and
AFAICT it turns out it was never explicitly discussed.  Or perhaps the
"MAP_ANONYMOUS semantics" actually just means "MAP_SHARED | MAP_FIXED + pgoff is
ignored", and everyone believed there was no need to explain what does "SGX
driver uses MAP_ANONYMOUS semantics for mmap()" mean.

Details:

The v20 story (that I spent most of my time on) mentioned by Haitao was actually
about how to make SGX and LSM work together but not related to SGX driver mmap()
semantic. 

Also Haitao mentioned "the use of anonymous mapping can be traced back to v29"
but this actually was just about how to use the first mmap() to "reserve the
ELRANGE before ECREATE".  It wasn't about to changing mmap(/dev/sgx_enclave)
semantics at all.

Sean actually suggested to explicitly document "how does SGX driver recommend
the user to reserve ELRANGE", but Jarkko didn't think we should do:

https://lore.kernel.org/linux-sgx/20200528111910.GB1666298@xxxxxxxxxxxxxxx/

which is a pity IMHO, because I believe for anyone, naturally, the first
instinct to reserve ELRANGE is to use mmap(/dev/sgx_enclave) but not
mmap(MAP_ANONYMOUS).  If we suggest user to use the latter then there must be
some reason and IMHO such suggestion and reason should be documented.

Also, if I am not missing something, the current driver doesn't prevent using
mmap(/dev/sgx_enclave, PROT_NONE) to reserve ELANGE.  So having clear
documentation is helpful for SGX users to choose how to write their apps.

Go back to the "SGX driver uses MAP_ANONYMOUS semantics for mmap()", I believe
this just is "SGX driver requires mmap() after ECREATE/EINIT to use MAP_SHARED |
MAP_FIXED and pgoff is ignored".  Or more precisely, pgoff is "not _used_ by SGX
driver".

In fact, I think "pgoff is ignored/not used" is technically wrong for enclave.

Pgoff is ignored in case of MAP_SHARED | MAP_ANONYMOUS makes sense, because you
get a new shmem file everytime you do so.  But this isn't the case for enclave.
For all mmap()s against the same enclave, pgoff has a valid meaning.  SGX driver
doesn't use vma->pgoff thus it's OK to not have valid vma->pgoff but this
confuses the core-MM, because now we can easily end up having multiple VMAs
mapping to different part of enclave, but core-MM believes they all map to the
start of the enclave.

For instance, have we tested all corner cases around VMA splitting/merging, etc?

To conculde:

IMHO we should stop saying SGX driver uses MAP_ANONYMOUS semantics, because the
truth is it just takes advantage of MAP_FIXED and carelessly ignores the pgoff
due to the nature of SGX w/o considering from core-MM's perspective.
  
And IMHO there are two ways to fix:

1) From now on, we ask SGX apps to use the correct pgoff in their
mmap(/dev/sgx_enclave).  This shouldn't impact the existing SGX apps because SGX
driver doesn't use vma->pgoff anyway.

2) For the sake of avoiding having to ask existing SGX apps to change their
mmap()s, we _officially_ say that userspace isn't required to pass a correct
pgoff to mmap() (i.e. passing 0 as did in existing apps), but the kernel should
fix the vma->pgoff internally.

I do prefer option 2) because it has no harm to anyone: 1) No changes to
existing SGX apps; 2) It aligns with the core-MM to so all existing mmap()
operations should work as expected, meaning no surprise; 3) And this patchset
from Haitao to use fadvice() to accelerate EAUG flow just works.

And I believe we should document all those staffs so everyone can understand.




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux