On Fri, May 19, 2017 at 2:35 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: > On Fri, May 19, 2017 at 04:29:13PM -0500, Josh Poimboeuf wrote: >> > How are you handling control flow? >> >> Control flow of what? >> >> > > Here's the struct in its current state: >> > > >> > > #define UNDWARF_REG_UNDEFINED 0 >> > > #define UNDWARF_REG_CFA 1 >> > > #define UNDWARF_REG_SP 2 >> > > #define UNDWARF_REG_FP 3 >> > > #define UNDWARF_REG_SP_INDIRECT 4 >> > > #define UNDWARF_REG_FP_INDIRECT 5 >> > > #define UNDWARF_REG_R10 6 >> > > #define UNDWARF_REG_DI 7 >> > > #define UNDWARF_REG_DX 8 >> > > >> > >> > Why only those registers? Also, if you have the option I would really >> > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp, >> > si, di, r8-r15 in that order.) >> >> Those are the only registers which are ever needed as the base for >> finding the previous stack frame. 99% of the time it's sp or bp, the >> other registers are needed for aligned stacks and entry code. >> >> Using the actual register numbers isn't an option because I don't need >> them all and they need to fit in a small number of bits. >> >> This construct might be useful for other arches, which is why I called >> it "FP" instead of "BP". But then I ruined that with the last 3 :-) > > BTW, here's the link to the unwinder code if you're interested: > > https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c At the risk of potentially overcomplicating matters, here's a suggestion. As far as I know, all (or most all?) unwinders effectively do the following in a loop: 1. Look up the IP to figure out how to unwind from that IP. 2. Use the results of that lookup to compute the previous frame state. The results of step 1 could perhaps be expressed like this: struct reg_formula { unsigned int source_reg :4; long offset; bool dereference; /* true: *(reg + offset); false: (reg + offset) */ /* For DWARF, I think this can be considerably more complicated, but I doubt it's useful. */ }; struct unwind_step { u16 available_regs; /* mask of the caller frame regs that we are able to recover */ struct reg_formula[16]; }; The CFA computation is just reg_formula[UNWIND_REG_SP] (or that plus or minus sizeof(unsigned long) or whatever -- I can never remember exactly what CFA refers to.) For a frame pointer-based unwinder, the entire unwind_step is a foregone conclusion independent of IP: SP = BP + 8 (or whatever), BP = *(BP + whatever), all other regs unknown. Could it make sense to actually structure the code this way? I can see a few advantages. It would make the actual meat of the unwind loop totally independent of the unwinding algorithm in use, it would make the meat be dead simple (and thus easy to verify for non-crashiness), and I think it just might open the door for a real in-kernel DWARF unwinder that Linus would be okay with. Specifically, write a function like: bool get_dwarf_step(struct unwind_step *step, unsigned long ip); Put this function in its own file and make it buildable as kernel code or as user code. Write a test case that runs it on every single address on the kernel image (in user mode!) with address-sanitizer enabled (or in Valgrind or both) and make sure that (a) it doesn't blow up and (b) that the results are credible (e.g. by comparing to objtool output). Heck, you could even fuzz-test it where the fuzzer is allowed to corrupt the actual DWARF data. You could do the same thing with whatever crazy super-compacted undwarf scheme someone comes up with down the road, too. I personally like the idea of using real DWARF annotations in the entry code because it makes gdb work better (not kgdb -- real gdb attached to KVM). I bet that we could get entry asm annotations into good shape if we really wanted to. OTOH, getting DWARF to work well for inline asm is really nasty IIRC. (H.J., could we get a binutils feature that allows is to do: pushq %whatever .cfi_adjust_sp -8 ... popq %whatever .cfi_adjust_sp 8 that will emit the right DWARF instructions regardless of whether there's a frame pointer or not? .cfi_adjust_cfa_offset is not particularly helpful here because it's totally wrong if the CFA is currently being computed based on BP.) Also, you read the stack like this: *val = *(unsigned long *)addr; how about probe_kernel_read() instead? --Andy -- To unsubscribe from this list: send the line "unsubscribe live-patching" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html