Re: Unable to load large enclave

Jethro Beekman <jethro@xxxxxxxxxxxx> · Wed, 7 Oct 2020 20:14:48 +0200

On 2020-10-07 19:20, Jarkko Sakkinen wrote:
> On Wed, Oct 07, 2020 at 06:13:49PM +0200, Jethro Beekman wrote:
>> On 2020-10-07 17:49, Jarkko Sakkinen wrote:
>>> On Tue, Oct 06, 2020 at 06:13:28PM +0300, Jarkko Sakkinen wrote:
>>>> On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote:
>>>>> On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote:
>>>>>> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
>>>>>>> On 2020-09-30 03:16, Jarkko Sakkinen wrote:
>>>>>>>> On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
>>>>>>>>> Since the latest API changes, I'm unable to load a large enclave. The
>>>>>>>>> test program at
>>>>>>>>> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
>>>>>>>>> always fails with ENOMEM after loading 0xffd6 pages.
>>>>>>>>>
>>>>>>>>> I've tested this with v36, if there's reason to believe it has been
>>>>>>>>> fixed I'd be happy to try it out on a newer patch set.
>>>>>>>>
>>>>>>>> I recommend using v39-rc1 tag that I created for testing because API is
>>>>>>>> reverted back to be compatible with v36.
>>>>>>>
>>>>>>> Not sure what you're saying. I tested with v36. You're saying v39-rc1
>>>>>>> will be the same? Or did you fix the issue since v36?
>>>>>>
>>>>>> v37 and v38 has an API change that is reverted in v39:
>>>>>>
>>>>>> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@xxxxxxxxxxxxxxx/
>>>>>>
>>>>>> I'm not sure of the root cause yet but you asked to try to out a newer
>>>>>> patch set and v39-rc1 is the best option.
>>>>>>
>>>>>> There was off-by-one error in enclave maximum size calculation fixed in
>>>>>> v37 (it was actually a bug in SDM inherited to the code) but that should
>>>>>> not result the situation you just described.
>>>>>
>>>>> My money is on the XArray changes, that's the most notable change in v36 and
>>>>> IIRC the only thing that touched EPC/memory management.
>>>>
>>>> Yeah, that's what we've been speculating for some days now. That's
>>>> somewhat deprecated email. It all started to enroll when I asked
>>>> Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information
>>>> required to root cause the bug.
>>>
>>> I run the failing test and filtered SGX mmap's and ioctl's with this
>>> eBPF script:
>>>
>>> kretprobe:sgx_ioctl /retval != 0/
>>> {
>>>         printf("sgx_ioctl: %d\n", retval)
>>> }
>>>
>>> kretprobe:sgx_mmap /retval != 0/
>>> {
>>>         printf("sgx_mmap: %d\n", retval)
>>> }
>>>
>>> This results zero positives, i.e. empty output, when run with bpftrace.
>>>
>>> I'd go instead after RLIMIT_AS [*].
>>>
>>> With these conclusions, I'm done with this bug.
>>>
>>
>> How can it be RLIMIT_AS? With the current flow, you mmap the whole range before mmaping the individual pages over it?
>>
>> Also, I can easily load a 1GB enclave with the old driver.
>>
>> Also:
>>
>> $ ulimit -v
>> unlimited
> 
> ➜  ~ (master) ✔ sudo bpftrace sgx_ret.bt
> Attaching 3 probes...
> ksys_mmap_pgoff: -12
> ^C
> 
> ~ (master) ✔ cat sgx_ret.bt
> kretprobe:sgx_ioctl /retval != 0/
> {
>         printf("sgx_ioctl: %d\n", retval)
> }
> 
> kretprobe:sgx_mmap /retval != 0/
> {
>         printf("sgx_mmap: %d\n", retval)
> }
> 
> kretprobe:ksys_mmap_pgoff /retval == (uint64)-12/
> {
>         printf("ksys_mmap_pgoff: %d\n", retval)
> }
> 
> This shows that it fails before reaching sgx_mmap().
> 
> /Jarkko
> 

It's this one in do_mmap():

	/* Too many mappings? */
	if (mm->map_count > sysctl_max_map_count)
		return -ENOMEM;

I've verified that I'm no longer getting the problem when increasing /proc/sys/vm/max_map_count . Why do I need to change this from the default compared to before?

--
Jethro Beekman | Fortanix

Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature