On 2023/3/17 AM8:29, HORIGUCHI NAOYA(堀口 直也) wrote: > On Thu, Mar 16, 2023 at 07:10:56PM +0800, Shuai Xue wrote: >> >> >> On 2023/3/16 PM3:21, HORIGUCHI NAOYA(堀口 直也) wrote: >>> On Mon, Feb 27, 2023 at 01:03:15PM +0800, Shuai Xue wrote: >>>> Hardware errors could be signaled by synchronous interrupt, e.g. when an >>>> error is detected by a background scrubber, or signaled by synchronous >>>> exception, e.g. when an uncorrected error is consumed. Both synchronous and >>>> asynchronous error are queued and handled by a dedicated kthread in >>>> workqueue. >>>> >>>> commit 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for >>>> synchronous errors") keep track of whether memory_failure() work was >>>> queued, and make task_work pending to flush out the workqueue so that the >>>> work for synchronous error is processed before returning to user-space. >>>> The trick ensures that the corrupted page is unmapped and poisoned. And >>>> after returning to user-space, the task starts at current instruction which >>>> triggering a page fault in which kernel will send SIGBUS to current process >>>> due to VM_FAULT_HWPOISON. >>>> >>>> However, the memory failure recovery for hwpoison-aware mechanisms does not >>>> work as expected. For example, hwpoison-aware user-space processes like >>>> QEMU register their customized SIGBUS handler and enable early kill mode by >>>> seting PF_MCE_EARLY at initialization. Then the kernel will directy notify >>>> the process by sending a SIGBUS signal in memory failure with wrong >>>> si_code: the actual user-space process accessing the corrupt memory >>>> location, but its memory failure work is handled in a kthread context, so >>>> it will send SIGBUS with BUS_MCEERR_AO si_code to the actual user-space >>>> process instead of BUS_MCEERR_AR in kill_proc(). >>>> >>>> To this end, separate synchronous and asynchronous error handling into >>>> different paths like X86 platform does: >>>> >>>> - task work for synchronous errors. >>>> - and workqueue for asynchronous errors. >>>> >>>> Then for synchronous errors, the current context in memory failure is >>>> exactly belongs to the task consuming poison data and it will send SIBBUS >>>> with proper si_code. >>>> >>>> Fixes: 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors") >>>> Signed-off-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx> >>> ... >>>> >>>> /* >>>> - * Called as task_work before returning to user-space. >>>> - * Ensure any queued work has been done before we return to the context that >>>> - * triggered the notification. >>>> + * struct mce_task_work - for synchronous RAS event >>> >>> This seems to handle synchronous memory errors, not limited to MCE? >>> So naming this struct as such (more generally) might be better. >> >> Yes. How about `sync_task_work`? > > Sounds better to me. Fine, I will change it in next version. >> >>> >>>> + * >>>> + * @twork: callback_head for task work >>>> + * @pfn: page frame number of corrupted page >>>> + * @flags: fine tune action taken >>>> + * >>>> + * Structure to pass task work to be handled before >>>> + * ret_to_user via task_work_add(). >>>> */ >>> ... >>> >>>> } >>>> >>>> -static bool ghes_do_memory_failure(u64 physical_addr, int flags) >>>> +static void ghes_do_memory_failure(u64 physical_addr, int flags) >>>> { >>>> unsigned long pfn; >>>> + struct mce_task_work *twcb; >>>> >>>> if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE)) >>>> - return false; >>>> + return; >>>> >>>> pfn = PHYS_PFN(physical_addr); >>>> if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) { >>>> pr_warn_ratelimited(FW_WARN GHES_PFX >>>> "Invalid address in generic error data: %#llx\n", >>>> physical_addr); >>>> - return false; >>>> + return; >>>> + } >>>> + >>>> + if (flags == MF_ACTION_REQUIRED && current->mm) { >>>> + twcb = kmalloc(sizeof(*twcb), GFP_ATOMIC); >>>> + if (!twcb) >>>> + return; >>> >>> When this kmalloc() fails, the error event might be silently dropped? >>> If so, some warning messages could be helpful. >> >> Yes, I was going to add a warning messages like: >> >> pr_err("Failed to handle memory failure due to out of memory\n"); >> >> But got a warning about patch when checked by checkpatch.pl. >> >> WARNING: Possible unnecessary 'out of memory' message >> >> I will add it back in next version :) > > Oh, I didn't know about this warning. I checked the commit log introduced > this meesages, and the justification makes sense to me. So I'd like to > withdraw my comment about this (I mean you don't have to add it back). > > commit ebfdc40969f24fc0cdd1349835d36e8ebae05374 > Author: Joe Perches <joe@xxxxxxxxxxx> > Date: Wed Aug 6 16:10:27 2014 -0700 > > checkpatch: attempt to find unnecessary 'out of memory' messages > > Logging messages that show some type of "out of memory" error are > generally unnecessary as there is a generic message and a stack dump > done by the memory subsystem. > > These messages generally increase kernel size without much added value. Haha, that's exactly the patch I was referring to (Sorry for forgetting to attach a link in last reply). So I will not add the warning messages back. > > Thanks, > Naoya Horiguchi Thank you for comments. Cheers. Shuai