On Wed, Sep 28, 2016 at 12:16 AM, Peter Zijlstra <peterz at infradead.org> wrote: > On Tue, Sep 27, 2016 at 05:22:13PM -0700, Vineet Gupta wrote: > >> > Yeah, Sparc64 might be a better example, it more closely matches your >> > hardware. See >> > arch/sparc/include/asm/irqflags_64.h:arch_local_irq_save(). >> >> So I finally got around to doing this and as expected has turned out to be quite >> some fun. I have a couple of questions and would really appreciate your inputs there. >> >> 1. Is it OK in general to short-circuit preemption off irq checks for NMI style >> interrupts. > > Yes. If the NMI returns to kernel space you must not attempt preemption > for reasons you found :-), Last time I looked at this, I decided that there was no reason that NMIs would ever need to handle preemption. Even if the NMI hit interruptible kernel code, anything that would cause preemption to be needed would either send an IPI (and thus cause preemption) right after the NMI fiinished. NMI handlers themselves have no business setting TIF_NEED_RESCHED or similar. > if the NMI returns to userspace you should do > the normal return to user bits, I think. x86 does this for simplicity. There was a really nasty corner case that I could only figure out how to solve by special casing NMIs from user space. I'm not sure that it's actually necessary from a non-arch-specific POV to handle all the usual return-to-userspace work on NMI. But maybe perf NMIs can send signals? x86's MCEs *do* need the full return-to-userspace handling for memory failure to work right. MCE is kind of like NMI... > >> 2. The low level return code, resume_user_mode_begin and/or resume_kernel_mode >> require interrupt safety, does that need to be NMI safe as well. We ofcourse want >> the very late register restore parts to be non-interruptible, but is this required >> before we call prrempt_schedule_irq() off of asm code. > > Urgh, I'm never quite sure on the details here, I've Cc'ed Andy who > might actually know this off the top of his head. I'll try and dig > through x86 to see what it does. On x86, it's quite simple. IRQs are *always* off during the final register restore, and we don't re-check for preemption there. x86 handles preemption after turning off IRQs, and IRQs are guaranteed to stay off until we actually return to userspace. The code is almost entirely in C in arch/x86/entry/common.c. There isn't anything particularly x86-speficic in there.