On 2024/2/26 18:29, Borislav Petkov wrote: > On Sat, Feb 24, 2024 at 02:08:42PM +0800, Shuai Xue wrote: >> @Borislav, do you have any other concerns? > > Yes, this change needs to be further reviewed by an ARM person: I have > no clue what those "abnormal synchronous errors" on ARM are Hi, Borislav, May the `abnormal` is not inaccurate and misled you. I mean the preconditions check before memory_failure_queue(): - `if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))` in ghes_handle_memory_failure() - `if (flags == -1)` in ghes_handle_memory_failure() - `if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))` in ghes_do_memory_failure() - `if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) ` in ghes_do_memory_failure() If the preconditions are not passed, the user-space process will trigger SEA again. This loop can potentially exceed the platform firmware threshold or even trigger a kernel hard lockup, leading to a system reboot. > and how > they're supposed to be handled properly there: > > - what happens if you get such an error when ghes is disabled there? If ghes_disable is set, the GHES driver will not be inited by acpi_ghes_init(), so none of error notifications will be handled. IMHO, it is expected. > > - is that even the right place to handle them? > > James? > Leave this to @James. Thank you. Best Regards, Shuai