On Tue, Aug 17, 2021 at 05:29:40PM -0700, Tony Luck wrote: > Recovery action when get_user() triggers a machine check uses the fixup > path to make get_user() return -EFAULT. Also queue_task_work() sets up > so that kill_me_maybe() will be called on return to user mode to send > a SIGBUS to the current process. > > But there are places in the kernel where the code assumes that this > EFAULT return was simply because of a page fault. The code takes some > action to fix that, and then retries the access. This results in a second > machine check. > > While processing this second machine check queue_task_work() is called > again. But since this uses the same callback_head structure that was used > in the first call, the net result is an entry on the current->task_works > list that points to itself. When task_work_run() is called it loops > forever in this code: > > do { > next = work->next; > work->func(work); > work = next; > cond_resched(); > } while (work); > > Add a counter (current->mce_count) to keep track of repeated machine > checks before task_work() is called. First machine check saves the address > information and calls task_work_add(). Subsequent machine checks before > that task_work call back is executed check that the address is in the > same page as the first machine check (since the callback will offline > exactly one page). > > Expected worst case is two machine checks before moving on (e.g. one user > access with page faults disabled, then a repeat to the same addrsss with > page faults enabled). Just in case there is some code that loops forever > enforce a limit of 10. > > Cc: <stable@xxxxxxxxxxxxxxx> What about a Fixes: tag? I guess backporting this to the respective kernels is predicated upon the existence of those other "places" in the kernel where code assumes the EFAULT was because of a #PF. Hmmm? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette