RE: [PATCH v2] mm: hwpoison: coredump: support recovery from dump_user_range()

"Luck, Tony" <tony.luck@xxxxxxxxx> · Wed, 26 Apr 2023 15:45:42 +0000

> >> Thanks for your confirm, and what your option about add
> >> MCE_IN_KERNEL_COPYIN to EX_TYPE_DEFAULT_MCE_SAFE/FAULT_MCE_SAFE type
> >> to let do_machine_check call queue_task_work(&m, msg, kill_me_never),
> >> which kill every call memory_failure_queue() after mc safe copy return?
> >
> > I haven't been following this thread closely. Can you give a link to the e-mail
> > where you posted a patch that does this? Or just repost that patch if easier.
>
> The major diff changes is [1], I will post a formal patch when -rc1 out,
> thanks.
>
> [1]
> https://lore.kernel.org/linux-mm/6dc1b117-020e-be9e-7e5e-a349ffb7d00a@xxxxxxxxxx/

There seem to be a few misconceptions in that message. Not sure if all of them
were resolved.  Here are some pertinent points:

>>> In my understanding, an MCE should not be triggered when MC-safe copy 
>>> tries
>>> to access to a memory error.  So I feel that we might be talking about
>>> different scenarios.

This is wrong. There is still a machine check when a MC-safe copy does a read
from a location that has a memory error.

The recovery flow in this case does not involve queue_task_work(). That is only
useful for machine check exceptions taken in user context. The queued work will
be executed to call memory_failure() from the kernel, but in process context (not
from the machine check exception stack) to handle the error.

For machine checks taken by kernel code (MC-safe copy functions) the recovery
path is here:

                if (m.kflags & MCE_IN_KERNEL_RECOV) {
                        if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
                                mce_panic("Failed kernel mode recovery", &m, msg);
                }

                if (m.kflags & MCE_IN_KERNEL_COPYIN)
                        queue_task_work(&m, msg, kill_me_never);

The "fixup_exception()" ensures that on return from the machine check handler
code returns to the extable[] fixup location instead of the instruction that was
loading from the memory error location.

When the exception was from one of the copy_from_user() variants it makes
sense to also do the queue_task_work() because the kernel is going to return
to the user context (with an EFAULT error code from whatever system call was
attempting the copy_from_user()).

But in the core dump case there is no return to user. The process is being
terminated by the signal that leads to this core dump. So even though you
may consider the page being accessed to be a "user" page, you can't fix
it by queueing work to run on return to user.

I don't have an well thought out suggestion on how to make sure that memory_failure()
is called for the page in this case. Maybe the core dump code can check for the
return from the MC-safe copy it is using and handle it in the error path?

-Tony