Thanks for your patient explanations. > STEP2: In IRQ context, ghes_proc_in_irq() queues memory failure work on current CPU > in workqueue and add task work to sync with the workqueue. Why is there a difference if the interrupted task was a user task vs. a kernel thread? It seems arbitrary. If the error can be handled in the kernel thread case without a task_work_add() to the current process, can't all errors be handled this way? The current thread likely has nothing to do with the error. Just a matter of chance on what is running when the NMI is delivered, right? -Tony