----- Original Message ----- > On Tue, Jun 14, 2016 at 10:42:03AM -0400, Dave Anderson wrote: > > > > > On Mon, Jun 13, 2016 at 11:30:24AM -0400, Dave Anderson wrote: > > > > > > > > > Dave, > > > > > > > > > > On Fri, Jun 10, 2016 at 04:37:42PM -0400, Dave Anderson wrote: > > > > > > Hi Takahiro, > > > > > > > > > > > > To address my concerns about your patch, I added a few additional changes and attached > > > > > > it to this email. The changes are: > > > > > > > > > > > > (1) Prevent the stack dump "below" the #0 level. Yes, the stack data region is contained within > > > > > > the incoming frame parameters, but it's ugly and we really don't > > > > > > care to see what's before > > > > > > the #0 crash_kexec and crash_save_cpu #0 frames. > > > > > > (2) Fill in the missing stack dump at the top of the process stack, up to, but not including > > > > > > the user-space exception frame. > > > > > > (3) Instead of showing the fp of 0 in the top-most frame's stack address, fill it in with the > > > > > > address of the user-space exception frame. > > > > > > > > > > > > Note that there is no dump of the stack containing the user-space exception frame, but the > > > > > > register dump itself should suffice. > > > > > > > > > > Well, the essential problem with my patch is that the output from "bt -f" > > > > > looks like: > > > > > #XX ['fp'] 'function' at 'pc' --- (1) > > > > > <function's stack dump> --- (2) > > > > > but that (1) and (2) are not printed as a single stack frame in the same > > > > > iteration of while loop in arm64_back_trace_cmd(). > > > > > (I hope you understand what I mean :) > > > > > > > > Actually I prefer your first approach. I find this new one confusing, not > > > > to mention unlike any of the other architectures in that the "frame level" > > > > #X address value is not contiguous with the stack addresses that get filled > > > > in by -f. > > > > > > Can you please elaborate a bit here about "is not contiguous"? > > > > I mean that the #X [address] is not contiguous with the stack addresses > > above and below it. For example: > > > > ffff8003dc103d60: ffff8003dc103dc0 ffff80000028041c > > ffff8003dc103d70: 0000000000000000 0000000000000022 > > ffff8003dc103d80: ffff8003db846b00 ffff8003db846b00 > > #3 [ffff8003dc103cf0] schedule_hrtimeout_range_clock at ffff8000007786f0 > > ffff8003dc103d90: ffff8003dc103dc0 ffff80000028052c > > ffff8003dc103da0: 0000000000000000 0000000000000022 > > ffff8003dc103db0: 0000000000000000 0000000000000000 > > > > > > > > > Taking your picture into account: > > > > > > > > stack grows to lower addresses. > > > > /|\ > > > > | > > > > | | > > > > new sp +------+ <--- > > > > |dyn | | > > > > | vars | | > > > > new fp +- - - + | > > > > |old fp| | a function's stack frame > > > > |old lr| | > > > > |static| | > > > > | vars| | > > > > old sp +------+ <--- > > > > |dyn | > > > > | vars | > > > > old fp +------+ > > > > | | > > > > > > > > Your first patch seemed natural to me because for any "#X" line containing a function > > > > name, that function's dynamic variables, the "old fp/old lr" pair, and the function's > > > > static variables were dumped below it (i.e., at higher stack > > > > addresses). > > > > > > > > > > > > > To be consistent with the out format of x86, the output should be > > > > > <function's stack dump> > > > > > #XX ['fp'] 'function' at 'pc' > > > > > > > > > > Unfortunately, this requires that arm64_back_trace_cmd() and other > > > > > functions should be overhauled. > > > > > Please take a look at my next patch though it is uncompleted and > > > > > still has room for improvement. > > > > > > > > I don't know what you mean by "consistent with the out format of x86"? With x86_64, > > > > each #<level> line is simply the stack address where the function pushed its return > > > > address as a result of its making a "callq" to the next function. Any local variables of > > > > the calling function would be at the next higher stack addresses: > > > > > > > > ... > > > > #X [stack address] function2 at 'return address' > > > > <function2's local variables> > > > > #Y [stack address] function1 at 'return address' > > > > <functions1's local variables> > > > > ... > > > > > > > > So for digging out local stack variables associated with a function, > > > > it's a simple > > > > matter of looking "below" it in the "bt -f" output. > > > > > > That is exactly what I meant by "consistent with x86." > > > On x86, the output looks like: > > > > > > <function2's local variables> > > > #X [stack address] function2 at 'return address' > > > <functions1's local variables> > > > #Y [stack address] function1 at 'return address' > > > ... > > > > No, that's not true -- look at my #X and #Y description above -- funcion2's local > > variables are at higher stack addresses than the #X "stack entry" address. They > > have to be -- the callq that pushes the return address at the #X stack location is > > the last stack manipulation that the function does. Expressed otherwise, a function's > > local variables are displayed "below" or "underneath" its #X line in the "bt -f" > > output. > > OK > > > > > > > So users who are familiar with this format may get confused. > > > (Or do I misunderstand anything?) > > > > > > In addition, my previous patch displays > > > <function2's local variables> > > > #Y [stack address] function1 at 'return address' > > > in arm64_print_stackframe_entry(), and it sounds odd to me. > > > > BTW, the order in which it is done is based upon the kernel's dump_backtrace() > > function, although I'll grant you that the kernel dump is only interested in > > the pc. > > Good point. The kernel's dump_backtrace() doesn't show the contents of > stack at each stack frame. If it did, we would see the same problem. > > In fact, backtracing a stack on arm64 precisely is a quite difficult > thinking of possible corner cases. For example, > a. A value of stack pointer is hard to determine > It would be a good idea to use a dwarf unwinder. > b. No actual stack frame for an exception entry code > So we need a special handling around the entry code > c. If an interrupt occurs during a function prologue, a stack frame > can partially or even worse never be created on a stack. > > If you're interested, see my past RFC: > http://lists.infradead.org/pipermail/linux-arm-kernel/2015-September/369316.html > > You may understand why I get so stuck with backtrace issues. Uhhhh, yeah -- I do now! ;-) > > > > > > > But, anyhow, it's up to you. > > > > > > > OK! Thanks for giving in... ;-) > > Not really yet :) > > >From the vmlinux/vmcore binaries that I sent you before, > with my v1 (and also Dave's) patch, > crash> bt -f > PID: 1223 TASK: ffff800020ef5780 CPU: 3 COMMAND: "sh" > #0 [ffff800020b0ba70] crash_kexec at ffff00000812b0ac > ffff800020b0ba60: ffffffffffffffff 6533333035642030 > ffff800020b0ba70: ffff800020b0ba90 ffff000008088ce8 > ffff800020b0ba80: ffff000008e67000 ffff800020b0bc20 > ... > #4 [ffff800020b0bb60] do_translation_fault at ffff00000809690c > ffff800020b0bb60: ffff800020b0bb70 ffff00000808128c > #5 [ffff800020b0bb70] do_mem_abort at ffff00000808128c > ffff800020b0bb70: ffff800020b0bd40 ffff000008084568 > ffff800020b0bb80: ffff000008dda000 0000000000000063 > ... > ffff800020b0bc00: 0000000000000000 0000000000000015 > ffff800020b0bc10: 0000000000000120 0000000000000040 > ffff800020b0bc20: 0000000000000001 0000000000000000 <- (1) > ffff800020b0bc30: ffff000008dda7b8 0000000000000000 > ffff800020b0bc40: 0000000000000000 00000000000047d4 > ... > ffff800020b0bd10: ffff000008457fb4 ffff800020b0bd40 > ffff800020b0bd20: ffff000008457fc8 0000000060400149 > ffff800020b0bd30: ffff000008dda000 ffff80002104d418 > #6 [ffff800020b0bd40] el1_da at ffff000008084568 > PC: ffff000008457fc8 [sysrq_handle_crash+32] > LR: ffff000008457fb4 [sysrq_handle_crash+12] > SP: ffff800020b0bd40 PSTATE: 60400149 > ... > ORIG_X0: ffff000008dda000 SYSCALLNO: ffff80002104d418 <- (2) > ffff800020b0bd40: ffff800020b0bd50 ffff000008458644 > #7 [ffff800020b0bd50] __handle_sysrq at ffff000008458644 <- (3) > ffff800020b0bd50: ffff800020b0bd90 ffff000008458ac0 > ffff800020b0bd60: 0000000000000002 fffffffffffffffb > ffff800020b0bd70: 0000000000000000 ffff800020b0bec8 > ffff800020b0bd80: 000000001e9da808 ffff800021ba84a0 > ... > > (1) This is actually an end of stack frame of #5 (do_mem_abort), > and also a start of exception frame of #6 (el1_da). > > (2) Those are meaningless for any exceptions in the kernel. > > (3) We miss one stack frame for "sysrq_handle_crash" > due to (b) I mentioned above. > > Thanks, > -Takahiro AKASHI Notwithstanding the cases above, but given the improvement in readability, I've checked the patch in as the basis for any future patches you may propose in the future: https://github.com/crash-utility/crash/commit/d1db3ff7581cab2ec5263291afc885b5b4fbfb1d Thanks, Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility