On Fri, May 17, 2019 at 11:33:30AM -0700, Linus Torvalds wrote: > On Fri, May 17, 2019 at 11:21 AM Sean Christopherson > <sean.j.christopherson@xxxxxxxxx> wrote: > > > > I agree that conceptually EPC is private memory, but because EPC is > > managed as a separate memory pool, SGX tags it VM_PFNMAP and manually > > inserts PFNs, i.e. EPC effectively it gets classified as IO memory. > > > > And vmf_insert_pfn_prot() doesn't like writable private IO mappings: > > > > BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags)); > > Hmm. I haven't looked into why you want to do your own page insertion > and not just "use existing pages", but I'm sure there's some reason. Outside of the SGX subsystem, the kernel is unaware of EPC memory, e.g. BIOS enumerates it as reserved memory in the e820 tables, or not at all. On current hardware, EPC is backed by system memory, but it's protected by a range registers (and other stuff) and can't be accessed directly except when the CPU is in "enclave mode", i.e. executing an enclave in CPL3. To execute an enclave it must first be built, and because EPC memory can't be written outside of enclave mode, the only way to build the enclave is via dedicated CPL0 ISA, e.g. ENCLS[EADD]. > It looks like the "shared vs private" inode part is a red herring, > though. You might as well give each opener of the sgx node its own > inode - and you probably should. Then you can keep track of the pages > that have been added in the inode->i_mapping, and you could avoid the > whole PFN thing entirely. I still am not a huge fan of the device node > in the first place, but I guess it's just one more place where a > system admin can then give (or deny) access to a kernel feature from > users. I guess the kvm people do the same thing, for not necessarily > any better reasons. > > With the PFNMAP model I guess the SGX memory ends up being unswappable > - at least done the obvious way. EPC memory is swappable in it's own terms, e.g. pages can be swapped from EPC to system RAM and vice versa, but again moving pages in and out of the EPC can only be done through dedicated CPL0 ISA. And there are additional TLB flushing requirements, evicted pages need to be refcounted against the enclave, evicted pages need an anchor in the EPC to ensure freshness, etc... Long story short, we decided to manage EPC in the SGX subsystem as a separate memory pool rather than modify the kernel's MMU to teach it how to deal with EPC. > Again, the way I'd expect it to be done is as a shmem inode - that > would I think be a better model. But I think that's a largely internal > design decision, and the device node could just do that eventually > (and the mmap could just map the populated shmem information into > memory, no PFNMAP needed - the inode and the mapping could be > "read-only" as far as the _user_ is concerned, but the i_mapping then > gets populated by the ioctl's). > > I have not actually looked at any of the SGX patches, so maybe you're > already doing something like that (although the PFNMAP comment makes > me think not), and quite possibly there's some fundamental reason why > you can't just use the shmem approach. > > So my high-level reaction here may be just the rantings of somebody > who just isn't familiar with what you do. My "why not shmem and > regular mmap" questions come from a 30000ft view without knowing any > of the details. > > Linus