On Wed, Oct 07, 2020 at 06:13:49PM +0200, Jethro Beekman wrote: > On 2020-10-07 17:49, Jarkko Sakkinen wrote: > > On Tue, Oct 06, 2020 at 06:13:28PM +0300, Jarkko Sakkinen wrote: > >> On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote: > >>> On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote: > >>>> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote: > >>>>> On 2020-09-30 03:16, Jarkko Sakkinen wrote: > >>>>>> On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote: > >>>>>>> Since the latest API changes, I'm unable to load a large enclave. The > >>>>>>> test program at > >>>>>>> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs > >>>>>>> always fails with ENOMEM after loading 0xffd6 pages. > >>>>>>> > >>>>>>> I've tested this with v36, if there's reason to believe it has been > >>>>>>> fixed I'd be happy to try it out on a newer patch set. > >>>>>> > >>>>>> I recommend using v39-rc1 tag that I created for testing because API is > >>>>>> reverted back to be compatible with v36. > >>>>> > >>>>> Not sure what you're saying. I tested with v36. You're saying v39-rc1 > >>>>> will be the same? Or did you fix the issue since v36? > >>>> > >>>> v37 and v38 has an API change that is reverted in v39: > >>>> > >>>> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@xxxxxxxxxxxxxxx/ > >>>> > >>>> I'm not sure of the root cause yet but you asked to try to out a newer > >>>> patch set and v39-rc1 is the best option. > >>>> > >>>> There was off-by-one error in enclave maximum size calculation fixed in > >>>> v37 (it was actually a bug in SDM inherited to the code) but that should > >>>> not result the situation you just described. > >>> > >>> My money is on the XArray changes, that's the most notable change in v36 and > >>> IIRC the only thing that touched EPC/memory management. > >> > >> Yeah, that's what we've been speculating for some days now. That's > >> somewhat deprecated email. It all started to enroll when I asked > >> Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information > >> required to root cause the bug. > > > > I run the failing test and filtered SGX mmap's and ioctl's with this > > eBPF script: > > > > kretprobe:sgx_ioctl /retval != 0/ > > { > > printf("sgx_ioctl: %d\n", retval) > > } > > > > kretprobe:sgx_mmap /retval != 0/ > > { > > printf("sgx_mmap: %d\n", retval) > > } > > > > This results zero positives, i.e. empty output, when run with bpftrace. > > > > I'd go instead after RLIMIT_AS [*]. > > > > With these conclusions, I'm done with this bug. > > > > How can it be RLIMIT_AS? With the current flow, you mmap the whole range before mmaping the individual pages over it? > > Also, I can easily load a 1GB enclave with the old driver. > > Also: > > $ ulimit -v > unlimited ➜ ~ (master) ✔ sudo bpftrace sgx_ret.bt Attaching 3 probes... ksys_mmap_pgoff: -12 ^C ~ (master) ✔ cat sgx_ret.bt kretprobe:sgx_ioctl /retval != 0/ { printf("sgx_ioctl: %d\n", retval) } kretprobe:sgx_mmap /retval != 0/ { printf("sgx_mmap: %d\n", retval) } kretprobe:ksys_mmap_pgoff /retval == (uint64)-12/ { printf("ksys_mmap_pgoff: %d\n", retval) } This shows that it fails before reaching sgx_mmap(). /Jarkko