On Tue, 2023-06-06 at 04:11 +0000, Huang, Kai wrote: > On Fri, 2023-05-26 at 19:32 -0500, Haitao Huang wrote: > > Hi Kai, Jarkko and Dave > > > > On Thu, 09 Mar 2023 05:31:29 -0600, Huang, Kai <kai.huang@xxxxxxxxx> wrote: > > > > > > So I am still a little bit confused about where does "SGX driver uses > > > MAP_ANONYMOUS semantics for fd-based mmap()" come from. > > > > > > Anyway, we certainly don't want to break userspace. However, IIUC, even > > > from > > > now on we change the driver to depend on userspace to pass the correct > > > pgoff in > > > mmap(), this won't break userspace, because old userspace which doesn't > > > use > > > fadvice() and pgoff actually doesn't matter. For new userspace which > > > uses > > > fadvice(), it needs to pass the correct pgoff. > > > > > > I am not saying we should do this, but it doesn't seem we can break > > > userspace? > > > > > > > Sorry for delayed update but I thought about this more and likely to > > propose a new EAUG ioctl for this and for enabling SGX-CET shadow stack > > pages. But regardless, I'd like to wrap up this discussion to just clarify > > this anonymous semantics design in documentation so people won't get > > confused in future. > > > > I think we all agree to keep this semantics so no user space would need > > specify 'offset' for mmap with enclave fd. And here is my proposed > > documentation changes. > > > > --- a/Documentation/x86/sgx.rst > > +++ b/Documentation/x86/sgx.rst > > @@ -100,6 +100,23 @@ pages and establish enclave page permissions. > > sgx_ioc_enclave_init > > sgx_ioc_enclave_provision > > > > +Enclave memory mapping > > +---------------------- > > + > > +A file descriptor created from opening **/dev/sgx_enclave** represents an > > +enclave object. The mmap() syscall with enclave file descriptors does not > > +support non-zero value for the 'offset' parameter. > > I think we all need to understand better why SGX driver requires anonymous > semantics mmap() against /dev/sgx_enclave, and as a result of that, requires > mmap() to pass 0 as pgoff (which looks wasn't even discussed when upstreaming > the driver). > > I'll do some investigation and try to summerize and report back. Thanks. > + Sean. Hi Sean, If you see this and have time, please help to comment. Thanks. I've spent plenty of time to look into the discussions around v20/v28/v29 and roughly v38/v39 to find out why SGX driver requires MAP_ANONYMOUS semantics, and AFAICT it turns out it was never explicitly discussed. Or perhaps the "MAP_ANONYMOUS semantics" actually just means "MAP_SHARED | MAP_FIXED + pgoff is ignored", and everyone believed there was no need to explain what does "SGX driver uses MAP_ANONYMOUS semantics for mmap()" mean. Details: The v20 story (that I spent most of my time on) mentioned by Haitao was actually about how to make SGX and LSM work together but not related to SGX driver mmap() semantic. Also Haitao mentioned "the use of anonymous mapping can be traced back to v29" but this actually was just about how to use the first mmap() to "reserve the ELRANGE before ECREATE". It wasn't about to changing mmap(/dev/sgx_enclave) semantics at all. Sean actually suggested to explicitly document "how does SGX driver recommend the user to reserve ELRANGE", but Jarkko didn't think we should do: https://lore.kernel.org/linux-sgx/20200528111910.GB1666298@xxxxxxxxxxxxxxx/ which is a pity IMHO, because I believe for anyone, naturally, the first instinct to reserve ELRANGE is to use mmap(/dev/sgx_enclave) but not mmap(MAP_ANONYMOUS). If we suggest user to use the latter then there must be some reason and IMHO such suggestion and reason should be documented. Also, if I am not missing something, the current driver doesn't prevent using mmap(/dev/sgx_enclave, PROT_NONE) to reserve ELANGE. So having clear documentation is helpful for SGX users to choose how to write their apps. Go back to the "SGX driver uses MAP_ANONYMOUS semantics for mmap()", I believe this just is "SGX driver requires mmap() after ECREATE/EINIT to use MAP_SHARED | MAP_FIXED and pgoff is ignored". Or more precisely, pgoff is "not _used_ by SGX driver". In fact, I think "pgoff is ignored/not used" is technically wrong for enclave. Pgoff is ignored in case of MAP_SHARED | MAP_ANONYMOUS makes sense, because you get a new shmem file everytime you do so. But this isn't the case for enclave. For all mmap()s against the same enclave, pgoff has a valid meaning. SGX driver doesn't use vma->pgoff thus it's OK to not have valid vma->pgoff but this confuses the core-MM, because now we can easily end up having multiple VMAs mapping to different part of enclave, but core-MM believes they all map to the start of the enclave. For instance, have we tested all corner cases around VMA splitting/merging, etc? To conculde: IMHO we should stop saying SGX driver uses MAP_ANONYMOUS semantics, because the truth is it just takes advantage of MAP_FIXED and carelessly ignores the pgoff due to the nature of SGX w/o considering from core-MM's perspective. And IMHO there are two ways to fix: 1) From now on, we ask SGX apps to use the correct pgoff in their mmap(/dev/sgx_enclave). This shouldn't impact the existing SGX apps because SGX driver doesn't use vma->pgoff anyway. 2) For the sake of avoiding having to ask existing SGX apps to change their mmap()s, we _officially_ say that userspace isn't required to pass a correct pgoff to mmap() (i.e. passing 0 as did in existing apps), but the kernel should fix the vma->pgoff internally. I do prefer option 2) because it has no harm to anyone: 1) No changes to existing SGX apps; 2) It aligns with the core-MM to so all existing mmap() operations should work as expected, meaning no surprise; 3) And this patchset from Haitao to use fadvice() to accelerate EAUG flow just works. And I believe we should document all those staffs so everyone can understand.