On Tue, Jul 06, 2021 at 08:16:06PM +0800, Ding Hui wrote: > Recently we encounter multi #MC on the same task when it's > task_work_run() has not been called, current->mce_kill_me was > added to task_works list more than once, that make a circular > linked task_works, so task_work_run() will do a endless loop. I saw the same and posted a similar fix a while back: https://www.spinics.net/lists/linux-mm/msg251006.html It didn't get merged because some validation tests began failing around the same time. I'm now pretty sure I understand what happened with those other tests. I'll post my updated version (second patch in a three part series) later today. > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > + if (!cmpxchg(¤t->mce_kill_me.func, NULL, ch.func)) { > + current->mce_addr = m->addr; > + current->mce_kflags = m->kflags; > + current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV); > + current->mce_whole_page = whole_page(m); You don't need an atomic cmpxchg here (nor the WRITE_ONCE() to clear it). The task is operating on its own task_struct. Nobody else should touch the mce_kill_me field. -Tony