Re: [RFC PATCH 0/4] SGX shmem backing store issue

Dave Hansen <dave.hansen@xxxxxxxxx> · Wed, 4 May 2022 16:50:54 -0700

On 5/4/22 16:36, Reinette Chatre wrote:
> The page fault handler cannot check for SGX_ENCL_PAGE_BEING_RECLAIMED
> until the reclaimer releases the mutex.

Ahh, got it now.  I thought the mutex wasn't held on the reclaimer side.
 Thanks for the refresher.

>> One other thing.  The role of encl->lock here is very important.
>> Without it, two concurrent page faults could do their individual
>> memset(), each see !pcmd_page_empty, then decline to truncate the page.
> 
> Your argument is not clear to me. Yes, this would be an issue (and
> that will not be the only issue) if the enclave mutex is not held
> by the page fault handler ... but the enclave mutex _is_ held by the
> page fault handler.

It would just be nice to clearly state the locking logic in the eventual
changelog.

>> Also, given the challenges here, I do think we should check the
>> pcmd_page after truncate to ensure it is still all zero's.
> 
> Could you please elaborate the challenges? I do not think it would
> help the issue at hand at all to add a check since this is done with
> the mutex held while the reclaimer has a reference to the page but no
> mutex ... as long as the page fault handler has that mutex it can write
> and check the PCMD page all it want, the new changes would occur to the
> PCMD page when it (the page fault handler) releases the mutex and the
> reclaimer attempts to use the page it has a reference to.

I'm just asking for a debugging check that looks at the page *after* is
has been truncated to ensure it is zero, in case another bug creeps in here.