Re: [PATCH v2 1/2] x86/sgx: Resolve EAUG race where losing thread returns SIGBUS

"Jarkko Sakkinen" <jarkko@xxxxxxxxxx> · Wed, 15 May 2024 17:28:43 +0300

On Wed May 15, 2024 at 5:15 PM EEST, Dave Hansen wrote:
> On 5/15/24 06:54, Jarkko Sakkinen wrote:
> > I'd cut out 90% of the description out and just make the argument of
> > the wrong error code, and done. The sequence is great for showing
> > how this could happen. The prose makes my head hurt tbh.
>
> The changelog is too long, but not fatally so.  I'd much rather have a
> super verbose description than something super sparse.
>
> Would something like this make more sense to folks?
>
> 	Imagine an mmap()'d file. Two threads touch the same address at
> 	the same time and fault. Both allocate a physical page and race
> 	to install a PTE for that page. Only one will win the race. The
> 	loser frees its page, but still continues handling the fault as
> 	a success and returns VM_FAULT_NOPAGE from the fault handler.
>
> 	The same race can happen with SGX. But there's a bug: the loser
> 	in the SGX steers into a failure path. The loser EREMOVE's the
> 	winner's EPC page, then returns SIGBUS, likely killing the app.
>
> 	Fix the SGX loser's behavior. Change the return code to
> 	VM_FAULT_NOPAGE to avoid SIGBUS and call sgx_free_epc_page()
> 	which avoids EREMOVE'ing the winner's page and only frees the
> 	page that the loser allocated.

Yes!

I did read the whole thing. My comment was only related to the
chain of maintainers who also have to deal with this patch
eventually.

BR, Jarkko