Re: [PATCH v2 2/2] ACPI: APEI: handle synchronous exceptions in task work

HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@xxxxxxx> · Fri, 17 Mar 2023 00:29:24 +0000

On Thu, Mar 16, 2023 at 07:10:56PM +0800, Shuai Xue wrote:
> 
> 
> On 2023/3/16 PM3:21, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Mon, Feb 27, 2023 at 01:03:15PM +0800, Shuai Xue wrote:
> >> Hardware errors could be signaled by synchronous interrupt, e.g.  when an
> >> error is detected by a background scrubber, or signaled by synchronous
> >> exception, e.g. when an uncorrected error is consumed. Both synchronous and
> >> asynchronous error are queued and handled by a dedicated kthread in
> >> workqueue.
> >>
> >> commit 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for
> >> synchronous errors") keep track of whether memory_failure() work was
> >> queued, and make task_work pending to flush out the workqueue so that the
> >> work for synchronous error is processed before returning to user-space.
> >> The trick ensures that the corrupted page is unmapped and poisoned. And
> >> after returning to user-space, the task starts at current instruction which
> >> triggering a page fault in which kernel will send SIGBUS to current process
> >> due to VM_FAULT_HWPOISON.
> >>
> >> However, the memory failure recovery for hwpoison-aware mechanisms does not
> >> work as expected. For example, hwpoison-aware user-space processes like
> >> QEMU register their customized SIGBUS handler and enable early kill mode by
> >> seting PF_MCE_EARLY at initialization. Then the kernel will directy notify
> >> the process by sending a SIGBUS signal in memory failure with wrong
> >> si_code: the actual user-space process accessing the corrupt memory
> >> location, but its memory failure work is handled in a kthread context, so
> >> it will send SIGBUS with BUS_MCEERR_AO si_code to the actual user-space
> >> process instead of BUS_MCEERR_AR in kill_proc().
> >>
> >> To this end, separate synchronous and asynchronous error handling into
> >> different paths like X86 platform does:
> >>
> >> - task work for synchronous errors.
> >> - and workqueue for asynchronous errors.
> >>
> >> Then for synchronous errors, the current context in memory failure is
> >> exactly belongs to the task consuming poison data and it will send SIBBUS
> >> with proper si_code.
> >>
> >> Fixes: 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors")
> >> Signed-off-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx>
> > ...
> >>  
> >>  /*
> >> - * Called as task_work before returning to user-space.
> >> - * Ensure any queued work has been done before we return to the context that
> >> - * triggered the notification.
> >> + * struct mce_task_work - for synchronous RAS event
> > 
> > This seems to handle synchronous memory errors, not limited to MCE?
> > So naming this struct as such (more generally) might be better.
> 
> Yes. How about `sync_task_work`?

Sounds better to me.

> 
> > 
> >> + *
> >> + * @twork:                callback_head for task work
> >> + * @pfn:                  page frame number of corrupted page
> >> + * @flags:                fine tune action taken
> >> + *
> >> + * Structure to pass task work to be handled before
> >> + * ret_to_user via task_work_add().
> >>   */
> > ...
> > 
> >>  }
> >>  
> >> -static bool ghes_do_memory_failure(u64 physical_addr, int flags)
> >> +static void ghes_do_memory_failure(u64 physical_addr, int flags)
> >>  {
> >>  	unsigned long pfn;
> >> +	struct mce_task_work *twcb;
> >>  
> >>  	if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))
> >> -		return false;
> >> +		return;
> >>  
> >>  	pfn = PHYS_PFN(physical_addr);
> >>  	if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
> >>  		pr_warn_ratelimited(FW_WARN GHES_PFX
> >>  		"Invalid address in generic error data: %#llx\n",
> >>  		physical_addr);
> >> -		return false;
> >> +		return;
> >> +	}
> >> +
> >> +	if (flags == MF_ACTION_REQUIRED && current->mm) {
> >> +		twcb = kmalloc(sizeof(*twcb), GFP_ATOMIC);
> >> +		if (!twcb)
> >> +			return;
> > 
> > When this kmalloc() fails, the error event might be silently dropped?
> > If so, some warning messages could be helpful.
> 
> Yes, I was going to add a warning messages like:
> 
>     pr_err("Failed to handle memory failure due to out of memory\n");
> 
> But got a warning about patch when checked by checkpatch.pl.
> 
>    WARNING: Possible unnecessary 'out of memory' message
> 
> I will add it back in next version :)

Oh, I didn't know about this warning.  I checked the commit log introduced
this meesages, and the justification makes sense to me. So I'd like to
withdraw my comment about this (I mean you don't have to add it back).

  commit ebfdc40969f24fc0cdd1349835d36e8ebae05374                            
  Author: Joe Perches <joe@xxxxxxxxxxx>                                      
  Date:   Wed Aug 6 16:10:27 2014 -0700                                      

      checkpatch: attempt to find unnecessary 'out of memory' messages       

      Logging messages that show some type of "out of memory" error are      
      generally unnecessary as there is a generic message and a stack dump   
      done by the memory subsystem.                                          

      These messages generally increase kernel size without much added value.

Thanks,
Naoya Horiguchi