Re: [RFC PATCH v4 2/4] x86/sgx: Implement support for MADV_WILLNEED

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2023-02-21 at 19:37 -0600, Haitao Huang wrote:
> On Tue, 21 Feb 2023 16:10:02 -0600, jarkko@xxxxxxxxxx <jarkko@xxxxxxxxxx>  
> wrote:
> 
> > On Fri, Feb 17, 2023 at 05:03:05PM -0600, Haitao Huang wrote:
> > > Hi Jarkko
> > > 
> > > On Fri, 17 Feb 2023 16:32:12 -0600, jarkko@xxxxxxxxxx  
> > > <jarkko@xxxxxxxxxx>
> > > wrote:
> > > 
> > > > On Wed, Feb 15, 2023 at 09:42:30AM -0600, Haitao Huang wrote:
> > > > > On Wed, 15 Feb 2023 02:51:23 -0600, Huang, Kai <kai.huang@xxxxxxxxx>
> > > > > wrote:
> > > > > 
> > > > > > > > > > 
> > > > > > > > 
> > > > > > > > Since sgx_mmap() can happen before enclave is created,
> > > > > calculating the
> > > > > > > > vm_pgoff
> > > > > > > > from enclave_base is conceptually wrong.  Even if you really  
> > > want
> > > > > > > to do
> > > > > > > > it, it
> > > > > > > > should be:
> > > > > > > > 
> > > > > > > > 	if (enclave_has_initialized())
> > > > > > > > 		vma->vm_pgoff = ...;
> > > > > > > 
> > > > > > > I got your point now. I can add a condition to test the
> > > > > SGX_ENCL_CREATED
> > > > > > > bit. However, we still have a hole if we must handle the  
> > > sequence
> > > > > > > mmap(..., enclave_fd) being called before ECREATE ioctl. We
> > > > > can't leave
> > > > > > > vm_pgoff not set for those cases.
> > > > > > > 
> > > > > > > Since no one does that so far, can we explicitly return an error
> > > > > from
> > > > > > > sgx_mmap when that happens?
> > > > > > > Other suggestions?
> > > > > > 
> > > > > > As I replied to patch 4/4, I believe userspace should pass the  
> > > correct
> > > > > > pgoff in
> > > > > > mmap().  It's wrong to always pass 0 or any random value.
> > > > > > 
> > > > > > If userspace follow the mmap() rule, you won't need to manually  
> > > set
> > > > > > vm_pgoff
> > > > > > here (which is hacky IMHO).  Everything works fine.
> > > > > > 
> > > > > 
> > > > > SGX driver was following MAP_ANONYMOUS semantics. If we change that,
> > > > > it'd
> > > > > break current usage/ABI.
> > > > > 
> > > > > I still think returning error for cases mmap(..., enclave_fd) if
> > > > > enclave is
> > > > > not created would be less intrusive change.
> > > > 
> > > > Is this something you care in SGX SDK?
> > > > 
> > > > It is not categorically forbidden so that's why I'm asking.
> > > 
> > > SDK does not care as we would never do this and don't think anyone is  
> > > doing
> > > that either. Suggesting returning error to cover all cases so user space
> > > would not accidentally cause incorrect vm_pgoff set.
> > 
> > vm_pgoff != 0 should result -EINVAL.
> > 
> 
> Do you mean 'offset' passed in from mmap syscall and converted to pgoff in  
> kernel? I can see it only passed to driver in get_unmapped_area. I can add  
> this enforcement there so it is consistent to the MAP_ANONYMOUS spec if  
> that's what you suggest.
> 
>   vma->vm_pgoff is the offset in pages of the vma->vm_start relative to  
> 'file'.
>   It can not be zero for all VMAs of the enclave.
> 
> Kernel sets vma->vm_pgoff == vma->vm_start>>PAGE_SHIFT for private anon  
> VMAs, and ignore it (zero) for shared anon mapping.
> 
> For us, we propose to set it to PFN_DOWN(vma->vm_start - encl->base) upon  
> mmap.

Sorry for late reply.  Basically I was sick and having limited working time in
the passed two weeks.

For what I understand now, SGX driver wants to use MAP_ANONYMOUS semantic for
mmap()s against /dev/sgx_enclave (for whatever reason that I don't understand). 

And because of that, we even want to explicitly enforce userspace to always pass
0 as pgoff in mmap()s (hasn't been done yet).

And then here, we want to rewrite vma->pgoff so that the VMA can act _like_ a
normal file-based mmap() semantics (despite it is indeed a file-based VMA),
because the VFS fadvice() implementation assumes file-based VMAs are always
following file-based mmap() semantics.

Hmm..

Doesn't this sound weird?

I must have been missing something, but why cannot SGX driver just always follow
file-based mmap() semantics at the beginning?

For instance, what's wrong with:

	1) encl_fd = open("/dev/sgx_enclave")
	2) encl_base = mmap(..., encl_size, MAP_SHARED, encl_fd, 0 /* pgoff */)
	3) ioctl(ECREATE, encl_base, encl_size)

(In ioctl(ECREATE), we might want to verify the VMA we found has the same
encl_base as starting address, and 0 pgoff).

And in following mmap()s in which we want to map a small range of enclave:

	encl_addr = mmap(encl_addr, MAP_SHARED|MAP_FIXED, encl_fd,
			(encl_addr - encl_base) >> PAGE_SHIFT);

?

Anything wrong above?

In fact, if the first mmap() in step 2) before IOCTL(ecreate) used
MAP_ANONYMOUS, IIUC that VMA will stay as anonymous VMA but will never be
converted to SGX file-based one, because by looking at the code, I see neither
ioctl(ECREATE) nor ioctl(EINIT) converts that VMA to file-based one.

That being said, all page faults caused by userspace access to that VMA will
still go to anonymous VMA's fault path, but not SGX driver's.

This is fine for SGX1 as all enclave pages are fully populated.  For SGX2,
_unless_ you explicitly mmap("/dev/sgx_enclave") those dynamical ranges, the
fault will never be handled by SGX driver.

Architecturally, for SGX2, if you mmap() the entire enclave range (as in step
2), you don't need explicitly mmap() all dynamic ranges.  The userspace should
just be able to access those dynamic ranges (i.e. EACCEPT) and the kernel driver
should gracefully handle the fault.

Shouldn't this be a problem?

(Btw, there might be other corner cases that could cause VMA splitting/merging,
etc, but I haven't thought about those)


> The concern Kai raised about encl->base not available in the window  
> between mmap and ECREATE can be addressed by disallowing mmap before  
> ECREATE is done. It does not make much sense anyway to mmap(enclave_fd)  
> without a valid enclave range associated with enclave_fd.
> 

We can, but I don't understand why the first mmap() before ECREATE must/should
be done via MMAP_ANONYMOUS.  In fact, I think it might be wrong for SGX2.









[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux