On Wed, Feb 12, 2025 at 10:18:11AM +1300, Huang, Kai wrote: > > > On 12/02/2025 10:03 am, Jarkko Sakkinen wrote: > > On Tue, Feb 11, 2025 at 08:25:58AM -0800, Dave Hansen wrote: > > > > arch_memory_failure() but stay on sgx_active_page_list. > > > > page->poison is not checked in the reclaimer logic meaning that a page could be > > > > reclaimed and go through ETRACK, EBLOCK and EWB. This can lead to the > > > > firmware receiving and MCE in one of those operations and going into > > > > "unbreakable shutdown" and triggering a kernel panic on remaining cores. > > > > > > This requires low-level SGX implementation knowledge to fully > > > understand. Both what "ETRACK, EBLOCK and EWB" are in the first place, > > > how they are involved in reclaim and also why EREMOVE doesn't lead to > > > the same fate. > > > > Does it? [I'll dig up Intel SDM to check this] > > > > I just did. :-) > > It seems EREMOVE only reads and updates the EPCM entry for the target EPC > page but won't actually access that EPC page. That was fast, thank you! This is pretty much also that should be explicitly stated in the commit message. BR, Jarkko