On Fri, 2023-06-16 at 15:05 -0700, Sean Christopherson wrote: > On Fri, Jun 16, 2023, Kai Huang wrote: > > On Tue, 2023-06-06 at 04:11 +0000, Huang, Kai wrote: > > > On Fri, 2023-05-26 at 19:32 -0500, Haitao Huang wrote: > > > > Hi Kai, Jarkko and Dave > > > > > > > > On Thu, 09 Mar 2023 05:31:29 -0600, Huang, Kai <kai.huang@xxxxxxxxx> wrote: > > > > > > > > > > So I am still a little bit confused about where does "SGX driver uses > > > > > MAP_ANONYMOUS semantics for fd-based mmap()" come from. > > > > > > > > > > Anyway, we certainly don't want to break userspace. However, IIUC, > > > > > even from now on we change the driver to depend on userspace to pass > > > > > the correct pgoff in mmap(), this won't break userspace, because old > > > > > userspace which doesn't use fadvice() and pgoff actually doesn't > > > > > matter. For new userspace which uses fadvice(), it needs to pass the > > > > > correct pgoff. > > > > > > > > > > I am not saying we should do this, but it doesn't seem we can break > > > > > userspace? > > > > > > > > > > > > > Sorry for delayed update but I thought about this more and likely to > > > > propose a new EAUG ioctl for this and for enabling SGX-CET shadow stack > > > > pages. But regardless, I'd like to wrap up this discussion to just clarify > > > > this anonymous semantics design in documentation so people won't get > > > > confused in future. > > > > > > > > I think we all agree to keep this semantics so no user space would need > > > > specify 'offset' for mmap with enclave fd. And here is my proposed > > > > documentation changes. > > > > > > > > --- a/Documentation/x86/sgx.rst > > > > +++ b/Documentation/x86/sgx.rst > > > > @@ -100,6 +100,23 @@ pages and establish enclave page permissions. > > > > sgx_ioc_enclave_init > > > > sgx_ioc_enclave_provision > > > > > > > > +Enclave memory mapping > > > > +---------------------- > > > > + > > > > +A file descriptor created from opening **/dev/sgx_enclave** represents an > > > > +enclave object. The mmap() syscall with enclave file descriptors does not > > > > +support non-zero value for the 'offset' parameter. > > > > > > I think we all need to understand better why SGX driver requires anonymous > > > semantics mmap() against /dev/sgx_enclave, and as a result of that, requires > > > mmap() to pass 0 as pgoff (which looks wasn't even discussed when upstreaming > > > the driver). > > > > > > I'll do some investigation and try to summerize and report back. Thanks. > > > > > > > + Sean. > > > > Hi Sean, > > > > If you see this and have time, please help to comment. Thanks. > > > > I've spent plenty of time to look into the discussions around v20/v28/v29 and > > roughly v38/v39 to find out why SGX driver requires MAP_ANONYMOUS semantics, > > AFAICT it turns out it was never explicitly discussed. Or perhaps the > > "MAP_ANONYMOUS semantics" actually just means "MAP_SHARED | MAP_FIXED + pgoff is > > ignored", and everyone believed there was no need to explain what does "SGX > > driver uses MAP_ANONYMOUS semantics for mmap()" mean. > > > > Details: > > > > The v20 story (that I spent most of my time on) mentioned by Haitao was actually > > about how to make SGX and LSM work together but not related to SGX driver mmap() > > semantic. > > > > Also Haitao mentioned "the use of anonymous mapping can be traced back to v29" > > but this actually was just about how to use the first mmap() to "reserve the > > ELRANGE before ECREATE". It wasn't about to changing mmap(/dev/sgx_enclave) > > semantics at all. > > > > Sean actually suggested to explicitly document "how does SGX driver recommend > > the user to reserve ELRANGE", but Jarkko didn't think we should do: > > > > https://lore.kernel.org/linux-sgx/20200528111910.GB1666298@xxxxxxxxxxxxxxx/ > > > > which is a pity IMHO, because I believe for anyone, naturally, the first > > instinct to reserve ELRANGE is to use mmap(/dev/sgx_enclave) but not > > mmap(MAP_ANONYMOUS). If we suggest user to use the latter then there must be > > some reason and IMHO such suggestion and reason should be documented. > > Ya, the use of mmap() on fd=-1 is done in order to find an available, naturally > aligned chunk of virtual memory[*]. IIRC, there was a (very brief) discussion > about enhancing .mmap() so that userspace wouldn't be responsible for doing the > alignment, but I think we didn't pursue that idea very because we had bigger fish > to fry. Thanks for your time :) Yeah I noticed in v20, the sgx_get_unmapped_area() internally would allocate 2*len and then manually return the aligned address. This wouldn't work for mmap()ing the small chunks with an offset smaller than the size > > But I think this is unrelated to what you really care about, e.g. a userspace that > tightly controls its virtual memory could hardcode enclave placement (IIRC graphene > did/does do that). I.e. the alignment issue is a completely different discussion. > > [*] https://lore.kernel.org/all/20190522153836.GA24833@xxxxxxxxxxxxxxx Right. For the sake of making progress of this Haitao's series, we want to understand where did "SGX driver requires mmap(/dev/sgx_enclave) to use MAP_ANONYMOUS semantics" come from. But for this topic (how to reserve ELRANGE). I am not sure the reason that we prefer mmap(MAP_ANONYMOUS) still stands. For instance, the discussion in your above link was old implementation/assumption -- e.g., MAP_FIXED wasn't even used/supported for mmap()ing enclave chunks. So I am wondering now whether we suggest user to use mmap(MAP_ANONYMOUS) to get a size-aligned address still stands? The current SGX driver's get_unmapped_area: static unsigned long sgx_get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) { if ((flags & MAP_TYPE) == MAP_PRIVATE) return -EINVAL; if (flags & MAP_FIXED) return addr; return current->mm->get_unmapped_area(file, addr, len, pgoff, flags); } As you can see if we don't pass MAP_FIXED, which is the case for mmap() to reserve ELRANGE, looks there's no difference between mmap(MAP_ANONYMOUS) and mmap(/dev/sgx_enclave)? > > > Also, if I am not missing something, the current driver doesn't prevent using > > mmap(/dev/sgx_enclave, PROT_NONE) to reserve ELANGE. So having clear > > documentation is helpful for SGX users to choose how to write their apps. > > > > Go back to the "SGX driver uses MAP_ANONYMOUS semantics for mmap()", I believe > > this just is "SGX driver requires mmap() after ECREATE/EINIT to use MAP_SHARED | > > MAP_FIXED and pgoff is ignored". Or more precisely, pgoff is "not _used_ by SGX > > driver". > > > > In fact, I think "pgoff is ignored/not used" is technically wrong for enclave. > > Yeah, it's wrong. It works because, until now apparently, there was never a reason > a need to care about the file offset since ELRANGE base always provided the necessary > information. It wasn't a deliberate design choice, we simply overlooked that detail > (unless Jarkko was holding back on me all those years ago :-) ). Yeah. But I am not sure whether there are corner cases that we can have potential bug around here, since those VMA's are not aligned between the core-mm and the driver. I haven't thought carefully though. I guess from core-mm's perspective there are multi-VMAs mapping to the same/overlapping part of the enclave, while the SGX driver treats them as mapping to different non-overlapping parts. Perhaps it's OK as long as SGX driver doesn't use vma->pgoff to do something??? > > > IMHO we should stop saying SGX driver uses MAP_ANONYMOUS semantics, because the > > truth is it just takes advantage of MAP_FIXED and carelessly ignores the pgoff > > due to the nature of SGX w/o considering from core-MM's perspective. > > > > And IMHO there are two ways to fix: > > > > 1) From now on, we ask SGX apps to use the correct pgoff in their > > mmap(/dev/sgx_enclave). This shouldn't impact the existing SGX apps because SGX > > driver doesn't use vma->pgoff anyway. > > Heh, just "asking" won't help. And breaking userspace, i.e. requiring all apps > to update, will just get you yelled at :-) We can document properly and assume the new apps will follow. As I mentioned above, the old apps, which doesn't/shouldn't have been using fadvice() anyway, doesn't need to be changed, i.e., they should continue to work. :) No? > > > 2) For the sake of avoiding having to ask existing SGX apps to change their > > mmap()s, we _officially_ say that userspace isn't required to pass a correct > > pgoff to mmap() (i.e. passing 0 as did in existing apps), but the kernel should > > fix the vma->pgoff internally. > > I recommend you don't do this. Overwriting vm_pgoff would probably work, but it's > going to make a flawed API even messier. E.g. you'll have painted SGX into a > corner if you ever want to decouple vma->start/end from encl->base. I highly > doubt that will ever happen given how ELRANGE works, but I don't think a hack-a-fix > buys you enough to justify any more ugliness. I also found it's not feasible to cleanly fix the pgoff from userspace (I thought the pgoff could be fixed at very early stage of do_mmap() so everything later just worked, but it's not the case). In do_mmap() the pgoff from userspace is already used to VMA merging/splitting etc before creating the target VMA. Hmm.. Now I think we shouldn't silently fix pgoff in SGX driver as it may surprise the core-mm later because the core-mm has already done some job around VMAs before vma->pgoff is changed. > > > I do prefer option 2) because it has no harm to anyone: 1) No changes to > > existing SGX apps; 2) It aligns with the core-MM to so all existing mmap() > > operations should work as expected, meaning no surprise; 3) And this patchset > > from Haitao to use fadvice() to accelerate EAUG flow just works. > > I think you can have your cake and eat it too. IIUC, the goal is to get fadvise() > working, and to do that without creating a truly heinous uAPI, you need an accurate > vm_pgoff. So, use a carrot and stick approach. > > If userspace properly defines vm_pgoff during mmap() and doesn't specify MAP_ANONYMOUS, > then they get to use fadvise() (give 'em a carrot). But if *any* mmap() on the > enclave doesn't followo those rules, mark the enclave as tainted or whatever and > disallow fadvise() (hit 'em with a stick). If we are supposed to use mmap(MAP_ANONYMOUS) to reserve ELRANGE, then I think userspace will just use MAP_FIXED for all mmap()s to /dev/sgx_enclave. To detect whether a VMA has the correct pgoff, we need to somehow mark it in sgx_mmap(), but currently we don't have facility to do that. Even we can do, the core madvice() -> vfs_fadvice() cannot be aware of such VMA either, and in sgx_fadvice() we may have problem locating the correct VMA to do EAUG (because the vfs_fadvice() uses 0 pgoff to calculate the file offset). So, now I think we should just let userspace itself to pass the correct pgoff when it wants to use fadvice(), which is the option 1) I mentioned above. The kernel doesn't/shouldn't need to fix vma->pgoff. In another words, I think: - For the old apps, doesn't matter, continue to work; - For new apps which want to use fadvice() to accelerate EAUG, it should pass the correct pgoff in mmap(/dev/sgx_enclave) even with MAP_FIXED. And we don't say "SGX driver uses MAP_ANONYMOUS semantics for mmap(/dev/sgx_enclave) any more". Does this make sense?