+ Peter. On Fri, Sep 21, 2018 at 11:17:04PM +0100, James Morse wrote: > arm64 can take an NMI-like error notification when user-space steps in > some corrupt memory. APEI's GHES code will call memory_failure_queue() > to schedule the recovery work. We then return to user-space, possibly > taking the fault again. > > Currently the arch code unconditionally signals user-space from this > path, so we don't get stuck in this loop, but the affected process > never benefits from memory_failure()s recovery work. To fix this we > need to know the recovery work will run before we get back to user-space. > > Increase the priority of the recovery work by scheduling it on the > system_highpri_wq, then try to bump the current task off this CPU > so that the recovery work starts immediately. > > Reported-by: Xie XiuQi <xiexiuqi@xxxxxxxxxx> > Signed-off-by: James Morse <james.morse@xxxxxxx> > Reviewed-by: Punit Agrawal <punit.agrawal@xxxxxxx> > Tested-by: Tyler Baicar <tbaicar@xxxxxxxxxxxxxx> > Tested-by: gengdongjiu <gengdongjiu@xxxxxxxxxx> > CC: Xie XiuQi <xiexiuqi@xxxxxxxxxx> > CC: gengdongjiu <gengdongjiu@xxxxxxxxxx> > --- > mm/memory-failure.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 0cd3de3550f0..4e7b115cea5a 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -56,6 +56,7 @@ > #include <linux/memory_hotplug.h> > #include <linux/mm_inline.h> > #include <linux/memremap.h> > +#include <linux/preempt.h> > #include <linux/kfifo.h> > #include <linux/ratelimit.h> > #include <linux/page-isolation.h> > @@ -1454,6 +1455,7 @@ static DEFINE_PER_CPU(struct memory_failure_cpu, memory_failure_cpu); > */ > void memory_failure_queue(unsigned long pfn, int flags) > { > + int cpu = smp_processor_id(); > struct memory_failure_cpu *mf_cpu; > unsigned long proc_flags; > struct memory_failure_entry entry = { > @@ -1463,11 +1465,14 @@ void memory_failure_queue(unsigned long pfn, int flags) > > mf_cpu = &get_cpu_var(memory_failure_cpu); > spin_lock_irqsave(&mf_cpu->lock, proc_flags); > - if (kfifo_put(&mf_cpu->fifo, entry)) > - schedule_work_on(smp_processor_id(), &mf_cpu->work); > - else > + if (kfifo_put(&mf_cpu->fifo, entry)) { > + queue_work_on(cpu, system_highpri_wq, &mf_cpu->work); > + set_tsk_need_resched(current); > + preempt_set_need_resched(); What guarantees the workqueue would run before the process? I see this: ``WQ_HIGHPRI`` Work items of a highpri wq are queued to the highpri worker-pool of the target cpu. Highpri worker-pools are served by worker threads with elevated nice level. but is that enough? -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.