--Andy > On Jun 29, 2017, at 2:41 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: > >> On Thu, Jun 29, 2017 at 02:09:54PM -0700, Andy Lutomirski wrote: >>> On Thu, Jun 29, 2017 at 12:05 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: >>>> On Thu, Jun 29, 2017 at 11:50:18AM -0700, Andy Lutomirski wrote: >>>>> On Thu, Jun 29, 2017 at 10:53 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: >>>>> There's a bug here that will need a small change to the entry code. >>>>> >>>>> Mike Galbraith reported: >>>>> >>>>> WARNING: can't dereference registers at ffffc900089d7e08 for ip ffffffff81740bbb >>>>> >>>>> After some looking I found that it's caused by the following code >>>>> snippet in the 'interrupt' macro in entry_64.S: >>>>> >>>>> /* >>>>> * Save previous stack pointer, optionally switch to interrupt stack. >>>>> * irq_count is used to check if a CPU is already on an interrupt stack >>>>> * or not. While this is essentially redundant with preempt_count it is >>>>> * a little cheaper to use a separate counter in the PDA (short of >>>>> * moving irq_enter into assembly, which would be too much work) >>>>> */ >>>>> movq %rsp, %rdi >>>>> incl PER_CPU_VAR(irq_count) >>>>> cmovzq PER_CPU_VAR(irq_stack_ptr), %rsp >>>>> UNWIND_HINT_REGS base=rdi >>>>> pushq %rdi >>>>> UNWIND_HINT_REGS indirect=1 >>>>> >>>>> The problem is that it's changing the stack pointer *before* writing the >>>>> previous stack pointer (push %rdi). So when unwinding from an NMI which >>>>> hit between the rsp write and the rdi push, the unwinder tries to access >>>>> the regs on the previous stack (by reading rdi), but the previous stack >>>>> pointer isn't there yet, so the access is considered out of bounds. >>>> >>>> Ugh, that code. Does this problem go away with this patch applied: >>>> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_ist&id=2231ec7e0bcc1a2bc94a17081511ab54cc6badd1 >>>> >>>> If so, want to update the patch for new kernels (shouldn't conflict >>>> with anything except your unwind hints)? >>> >>> I don't think that patch will fix it, because it still updates rsp >>> *before* writing the old rsp on the new stack. So there's still a >>> window where the "previous stack" pointer is missing. >> >> But it's in a register. Is undwarf not able to grok that? > > Sorry, I didn't explain it very well. Undwarf can find the regs pointer > in rdi, it just doesn't trust its value. > > See the stack_info.next_sp field, which is set in in_irq_stack(): > > /* > * The next stack pointer is the first thing pushed by the entry code > * after switching to the irq stack. > */ > info->next_sp = (unsigned long *)*(end - 1); > > It's a safety mechanism. The unwinder needs the last word of the irq > stack page to point to the previous stack. That way it can double check > that the stack pointer it calculates is within the bounds of either the > current stack or the previous stack. > > In the above code, the previous stack pointer (or next stack pointer, > depending on your perspective) hasn't been set up before it switches > stacks. So the unwinder reads an uninitialized value into > info->next_sp, and compares that with the regs pointer, and then stops > the unwind because it thinks it went off into the weeds. > That should be manageable, though, I think. With my patch applied (and maybe even without it), the only exception to that rule is if regs->sp points just above the top of the IRQ stack and the next instruction is push reg. In that case, the reg is exactly as trustworthy as the normal rule.* Can you teach the unwinding code that this is okay? * If an NMI hits right there, then it relies on unwinding out of the NMI correctly. But the usual checks that the target stack is a valid stack should prevent us from going off into the weeds regardless. > -- > Josh -- To unsubscribe from this list: send the line "unsubscribe live-patching" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html