Re: [PATCH v2] ACPI: APEI: do not add task_work to kernel thread to avoid memory leak

Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx> · Tue, 27 Sep 2022 11:50:24 +0800

在 2022/9/26 PM11:20, Luck, Tony 写道:
>>>
>>> -               if (task_work_pending && current->mm != &init_mm) {
>>> +               if (task_work_pending && current->mm) {
>>>                         estatus_node->task_work.func = ghes_kick_task_work;
>>>                         estatus_node->task_work_cpu = smp_processor_id();
>>>                         ret = task_work_add(current, &estatus_node->task_work,
> 
> It seems that you are getting errors reported while running kernel threads. This fix avoids
> pointlessly adding "task_work" that will never be processed because kernel threads never
> return to user mode.

Yes, you are right.

> 
> But maybe something else needs to be done? The code was, and with this fix still is,
> taking no action for the error. That doesn't seem right.

Sorry, I don't think so. As far as I know, on Arm platform, hardware error can signal
exceptions including:

- Synchronous External Abort (SEA), e,g. data abort or instruction abort
- Asynchronous External Abort (signalled as SErrors), e,g. L2 can generate SError for
  error responses from the interconnect for a Device or Non-cacheable store
- Fault Handling and Error Recovery interrupts: DDR mainline/demand/scrubber error interrupt

When the error signals asynchronous exceptions (SError or interrupt), any kind of thread can
be interrupted, including kernel thread. Because asynchronous exceptions are signaled in
background, the errors are detected outside of the current execution context.

The GHES driver always queues work to handle memory failure of a page in memory_failure_queue().
If a kernel thread is interrupted:

- Without this fix, the added task_work will never be processed so that the work will not
  be canceled.
- With this fix, the task_work will not be added.

In a conclusion, the error will be handled in a kworker with or without this fix.

The point of fix is that:

- The condition is useless because it is always tree. And I think it is not the original patch
  intends to do.
- The current code leads to memory leaks. The estatus_node will not be freed when task_work is
  added to a kernel thread.

> 
> Are you injecting errors to create this scenario? 

Yes, I am injecting error to create such scenario. After 200 injections, the ghes_estatus_pool
will run of memory and ghes_in_nmi_queue_one_entry() returns ENOMEM. Finally, a lot of unhandled
events may cause platform firmware to exceed some threshold and reboot.

> What does your test do?

My injection includes two steps:

- mmap a device memory for userspace in a driver
- inject uc and trigger write like ras-tools does

I have opened source the code and you can find here[1]. It's forked from your repo and mainly based
on your code :)

By the way, do you have any plans to extend ras-tools to Arm platform. And is there a mail-list
for ras-tools? I send a mail to add test cases to ras-tools several weeks ago, but no response.
May your mailbox regards it as Spam.

[1] https://gitee.com/anolis/ras-tools/tree/arm-devel