On Wed May 15, 2024 at 5:15 PM EEST, Dave Hansen wrote: > On 5/15/24 06:54, Jarkko Sakkinen wrote: > > I'd cut out 90% of the description out and just make the argument of > > the wrong error code, and done. The sequence is great for showing > > how this could happen. The prose makes my head hurt tbh. > > The changelog is too long, but not fatally so. I'd much rather have a > super verbose description than something super sparse. > > Would something like this make more sense to folks? > > Imagine an mmap()'d file. Two threads touch the same address at > the same time and fault. Both allocate a physical page and race > to install a PTE for that page. Only one will win the race. The > loser frees its page, but still continues handling the fault as > a success and returns VM_FAULT_NOPAGE from the fault handler. > > The same race can happen with SGX. But there's a bug: the loser > in the SGX steers into a failure path. The loser EREMOVE's the > winner's EPC page, then returns SIGBUS, likely killing the app. > > Fix the SGX loser's behavior. Change the return code to > VM_FAULT_NOPAGE to avoid SIGBUS and call sgx_free_epc_page() > which avoids EREMOVE'ing the winner's page and only frees the > page that the loser allocated. Yes! I did read the whole thing. My comment was only related to the chain of maintainers who also have to deal with this patch eventually. BR, Jarkko