On Tue, 2022-11-01 at 02:46 +0200, jarkko@xxxxxxxxxx wrote: > On Mon, Oct 24, 2022 at 09:32:13AM +0800, Zhiquan Li wrote: > > > > > > On 2022/10/24 04:39, jarkko@xxxxxxxxxx wrote: > > > > As you can see if the EPC page has already been populated at a given index of > > > > one virtual EPC instance, the current fault handler just assumes the mapping is > > > > already there and returns success immediately. This causes a bug when one > > > > virtual EPC instance is shared by multi processes via fork(): if the EPC page at > > > > one index is already populated by the parent process, when the child accesses > > > > the same page using different virtual address, the fault handler just returns > > > > success w/o actually setting up the mapping for the child, resulting in endless > > > > page fault. > > > > > > > > This needs to be fixed in no matter what way. > > > I think you mean that vm_insert_pfn() does not happen for child because > > > of early return? I did not understand the part about "different virtual > > > addresses", as it is the same mapping. > > > > > > > If userspace do something like this, the child will get "different > > virtual address": > > > > ... parent run enclave within VM ... > > if (fork() == 0) { > > int *caddr = mmap(NULL, 4096, PROT_READ, MAP_SHARED, vepc_fd, 0); > > printf("child: %d\n", caddr[0]); > > } > > > > > > - "vepc_fd" is inherited from parent which had opened /dev/sgx_vepc. > > - mmap() will create a VMA in child with "different virtual addresses". > > - "caddr[0]" will cause a page fault as it's a new mapping. > > > > 1. Then kernel will run into the code snippet referenced by Kai. > > 2. The early return 0 will result in sgx_vepc_fault() return > > "VM_FAULT_NOPAGE". > > 3. This return value will make the condition as true at > > function do_user_addr_fault() > > > > if (likely(!(fault & VM_FAULT_ERROR))) > > return; > > > > 4. Since this page fault has not been handled and "$RIP" is still the > > original value, it will result in the same page fault again. Namely, > > it's an endless page fault. > > > > But the problem is neither the early return in __sgx_vepc_fault() nor > > the return of VM_FAULT_NOPAGE at sgx_vepc_fault(). The root cause has > > been highlighted by Kai, one virtual EPC instance > > can only be mmap()-ed by the process which opens /dev/sgx_vepc. > > > > In fact, to share a virtual EPC instance in userspace doesn't make any > > sense. Even though it can be shared by child, the virtual EPC page > > cannot be used by child correctly. > > OK, makes sense, thanks for the explanation! > > Why would we want to enforce for user space not to do this, even > if it does cause malfunctioning program? > > BR, Jarkko Hi Jarkko, Dave, I've been re-thinking about this #MC handle on virtual EPC by stepping back to the beginning, and I think we have more problems than this "whether kernel should enforce child cannot mmap() virtual EPC". First of all, if we want to use epc->owner to carry the userspace virtual address, "make kernel enforce child cannot mmap() virtual EPC" alone isn't good enough -- nothing prevents userspace to call mmap() several times to map the same virtual EPC pages. So additionally, we also need to "make kernel enforce one virtual EPC can only be mmap()-ed once". Secondly, I am thinking that the current arch_memory_failure() cannot really handle #MC for virtual EPC page correctly. The problem is, even we mark the page as poisoned, and send signal to userspace to inject #MC to guest to handle, the poisoned virtual EPC page is never unmapped from the guest and then freed. This means a malicious guest can always try to use the poisoned EPC page again after it receives #MC on some EPC page. I am not entirely sure what kind behaviour/attack can be done in such case, but it seems the right behaviour should be the KVM to inject the #MC and unmap the poisoned EPC page from guest. And if guest ever tries to use this "guest's EPC page" (GFN) again, KVM should kill the guest. Hi Sean, If you ever see this, could you also comment?