Re: [PATCH v2] mm: hwpoison: coredump: support recovery from dump_user_range()

HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@xxxxxxx> · Thu, 27 Apr 2023 02:31:08 +0000

On Thu, Apr 27, 2023 at 09:06:46AM +0800, Kefeng Wang wrote:
> 
> 
> On 2023/4/26 23:45, Luck, Tony wrote:
> > > > > Thanks for your confirm, and what your option about add
> > > > > MCE_IN_KERNEL_COPYIN to EX_TYPE_DEFAULT_MCE_SAFE/FAULT_MCE_SAFE type
> > > > > to let do_machine_check call queue_task_work(&m, msg, kill_me_never),
> > > > > which kill every call memory_failure_queue() after mc safe copy return?
> > > > 
> > > > I haven't been following this thread closely. Can you give a link to the e-mail
> > > > where you posted a patch that does this? Or just repost that patch if easier.
> > > 
> > > The major diff changes is [1], I will post a formal patch when -rc1 out,
> > > thanks.
> > > 
> > > [1]
> > > https://lore.kernel.org/linux-mm/6dc1b117-020e-be9e-7e5e-a349ffb7d00a@xxxxxxxxxx/
> > 
> > There seem to be a few misconceptions in that message. Not sure if all of them
> > were resolved.  Here are some pertinent points:
> > 
> > > > > In my understanding, an MCE should not be triggered when MC-safe copy
> > > > > tries
> > > > > to access to a memory error.  So I feel that we might be talking about
> > > > > different scenarios.
> > 
> > This is wrong. There is still a machine check when a MC-safe copy does a read
> > from a location that has a memory error.

Yes, the above was my first impression to be proven wrong ;)

> > 
> > The recovery flow in this case does not involve queue_task_work(). That is only
> > useful for machine check exceptions taken in user context. The queued work will
> > be executed to call memory_failure() from the kernel, but in process context (not
> > from the machine check exception stack) to handle the error.
> > 
> > For machine checks taken by kernel code (MC-safe copy functions) the recovery
> > path is here:
> > 
> >                  if (m.kflags & MCE_IN_KERNEL_RECOV) {
> >                          if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
> >                                  mce_panic("Failed kernel mode recovery", &m, msg);
> >                  }
> > 
> >                  if (m.kflags & MCE_IN_KERNEL_COPYIN)
> >                          queue_task_work(&m, msg, kill_me_never);
> > 
> > The "fixup_exception()" ensures that on return from the machine check handler
> > code returns to the extable[] fixup location instead of the instruction that was
> > loading from the memory error location.
> > 
> > When the exception was from one of the copy_from_user() variants it makes
> > sense to also do the queue_task_work() because the kernel is going to return
> > to the user context (with an EFAULT error code from whatever system call was
> > attempting the copy_from_user()).
> > 
> > But in the core dump case there is no return to user. The process is being
> > terminated by the signal that leads to this core dump. So even though you
> > may consider the page being accessed to be a "user" page, you can't fix
> > it by queueing work to run on return to user.
> 
> For coredump，the task work will be called too, see following code,
> 
> get_signal
> 	sig_kernel_coredump
> 		elf_core_dump
> 			dump_user_range
> 				_copy_from_iter // with MC-safe copy, return without panic
> 	do_group_exit(ksig->info.si_signo);
> 		do_exit
> 			exit_task_work
> 				task_work_run
> 					kill_me_never
> 						memory_failure
> 
> I also add debug print to check the memory_failure() processing after
> add MCE_IN_KERNEL_COPYIN to MCE_SAFE exception type, also tested CoW of
> normal page and huge page, it works too.

Sounds nice to me.
Maybe this information is worth documenting in the patch description.

Thanks,
Naoya Horiguchi