Re: Unable to load large enclave

Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx> · Wed, 7 Oct 2020 20:20:58 +0300

On Wed, Oct 07, 2020 at 06:13:49PM +0200, Jethro Beekman wrote:
> On 2020-10-07 17:49, Jarkko Sakkinen wrote:
> > On Tue, Oct 06, 2020 at 06:13:28PM +0300, Jarkko Sakkinen wrote:
> >> On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote:
> >>> On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote:
> >>>> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
> >>>>> On 2020-09-30 03:16, Jarkko Sakkinen wrote:
> >>>>>> On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
> >>>>>>> Since the latest API changes, I'm unable to load a large enclave. The
> >>>>>>> test program at
> >>>>>>> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
> >>>>>>> always fails with ENOMEM after loading 0xffd6 pages.
> >>>>>>>
> >>>>>>> I've tested this with v36, if there's reason to believe it has been
> >>>>>>> fixed I'd be happy to try it out on a newer patch set.
> >>>>>>
> >>>>>> I recommend using v39-rc1 tag that I created for testing because API is
> >>>>>> reverted back to be compatible with v36.
> >>>>>
> >>>>> Not sure what you're saying. I tested with v36. You're saying v39-rc1
> >>>>> will be the same? Or did you fix the issue since v36?
> >>>>
> >>>> v37 and v38 has an API change that is reverted in v39:
> >>>>
> >>>> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@xxxxxxxxxxxxxxx/
> >>>>
> >>>> I'm not sure of the root cause yet but you asked to try to out a newer
> >>>> patch set and v39-rc1 is the best option.
> >>>>
> >>>> There was off-by-one error in enclave maximum size calculation fixed in
> >>>> v37 (it was actually a bug in SDM inherited to the code) but that should
> >>>> not result the situation you just described.
> >>>
> >>> My money is on the XArray changes, that's the most notable change in v36 and
> >>> IIRC the only thing that touched EPC/memory management.
> >>
> >> Yeah, that's what we've been speculating for some days now. That's
> >> somewhat deprecated email. It all started to enroll when I asked
> >> Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information
> >> required to root cause the bug.
> > 
> > I run the failing test and filtered SGX mmap's and ioctl's with this
> > eBPF script:
> > 
> > kretprobe:sgx_ioctl /retval != 0/
> > {
> >         printf("sgx_ioctl: %d\n", retval)
> > }
> > 
> > kretprobe:sgx_mmap /retval != 0/
> > {
> >         printf("sgx_mmap: %d\n", retval)
> > }
> > 
> > This results zero positives, i.e. empty output, when run with bpftrace.
> > 
> > I'd go instead after RLIMIT_AS [*].
> > 
> > With these conclusions, I'm done with this bug.
> > 
> 
> How can it be RLIMIT_AS? With the current flow, you mmap the whole range before mmaping the individual pages over it?
> 
> Also, I can easily load a 1GB enclave with the old driver.
> 
> Also:
> 
> $ ulimit -v
> unlimited

➜  ~ (master) ✔ sudo bpftrace sgx_ret.bt
Attaching 3 probes...
ksys_mmap_pgoff: -12
^C

~ (master) ✔ cat sgx_ret.bt
kretprobe:sgx_ioctl /retval != 0/
{
        printf("sgx_ioctl: %d\n", retval)
}

kretprobe:sgx_mmap /retval != 0/
{
        printf("sgx_mmap: %d\n", retval)
}

kretprobe:ksys_mmap_pgoff /retval == (uint64)-12/
{
        printf("ksys_mmap_pgoff: %d\n", retval)
}

This shows that it fails before reaching sgx_mmap().

/Jarkko