Dave, My apologies for slow response since I was attending LinuxCon last week. On Wed, Jul 13, 2016 at 04:49:31PM -0400, Dave Anderson wrote: > > Nakahiro, > > As I had last proposed, I have kept the original backtrace method in place, > while making your redesigned method available as a bt option accessible by > using "bt -o", or by using "bt -O" to toggle the default method from the > original to the optional. IMHO, add an "optional" method is not a good idea if you don't see good advantages over the original. The option should be reserved for dwarf-based back-tracer in case that somebody wants to implement it in the future :) > There were a couple bugs in your backtrace method. Anyway, I will try to review your changes, but cannot give it a priority in the next few weeks. Thanks, -Takahiro AKASHI > > (1) Although I can't find a sample vmcore where this piece gets executed, > this part was certainly incorrect: > > + /* > + * Keep a buffer for a while until > + * displaying the last frame on IRQ stack > + * at next arm64_print_stackframe_entry() > + */ > + if (bt->flags | BT_FULL) > + ms->irq_stackbuf = bt->stackbuf; > > (2) And on "older" kernels, your backtrace generates a segmentation violation > on any user-space task. I'm not sure when things changed, but at least on > 3.19 kernels, the fp of the fp/pc stackframe pair that references the > kernel-entry exception is not 0, but rather contains the user-space stack > address. > > For example, in a 4.5-based kernel, here's a typical syscall-generated > fp/pc stackframe, where the fp is 0: > > crash> bt -F 2513 > ... [ cut ] ... > #11 [ffff8003dbf23e80] sys_write at ffff800000237898 > ffff8003dbf23e80: 0000000000000000 __sys_trace_return <==== fp 0 > ffff8003dbf23e90: 0000000000000200 0000ffff76e30000 > ffff8003dbf23ea0: ffffffffffffffff __sys_trace+28 > ffff8003dbf23eb0: 0000000000000002 0000ffff76e30000 > ffff8003dbf23ec0: ffffffffffffffff 0000000000000000 > #12 [ffff8003dbf23ed0] __sys_trace at ffff800000091a88 > PC: 0000ffff7d7dbda8 LR: 0000ffff7d7835d4 SP: 0000fffff90fe1b0 > X29: 0000fffff90fe1b0 X28: 0000000000000000 X27: 00000000004fb000 > X26: 00000000004bb420 X25: 0000000000000001 X24: 00000000004f8000 > X23: 0000000000000000 X22: 0000000000000002 X21: 0000ffff7d881168 > X20: 0000ffff76e30000 X19: 0000000000000002 X18: 0000000000000000 > X17: 0000ffff7d780e20 X16: 0000000000000000 X15: 00192ea0bab15d05 > X14: 0000000000000000 X13: 0000000000000000 X12: 0000000000000001 > X11: 000000001c1fc6a0 X10: 00000000004fd000 X9: 0000fffff90fe130 > X8: 0000000000000040 X7: 0000000000000001 X6: 0000ffff7d759a98 > X5: 0000000000000001 X4: 00000000fbad2a84 X3: 0000000000000000 > X2: 0000000000000002 X1: 0000ffff76e30000 X0: 0000000000000001 > ORIG_X0: 0000000000000001 SYSCALLNO: 40 PSTATE: 20000000 > crash> > > But on a 3.19 kernel, it looks like this, where the fp is the user-space > stack address seen in the exception frame: > > crash> bt -F 27755 > ... [ cut ] ... > #10 [fffffe035de9be80] sys_write at fffffe00001e8720 > fffffe035de9be80: 000003ffed468290 __sys_trace_return <== fp 3ffed468290 > fffffe035de9be90: 0000000000000200 000003ff955f0000 > fffffe035de9bea0: ffffffffffffffff __sys_trace+28 > fffffe035de9beb0: 0000000000000002 000003ff955f0000 > fffffe035de9bec0: ffffffffffffffff 0000000000000000 > #11 [fffffe035de9bed0] __sys_trace at fffffe00000939c8 > PC: 000003ff9bf9c668 LR: 000003ff9bf439cc SP: 000003ffed468290 > X29: 000003ffed468290 X28: 0000000000000000 X27: 00000000004fb000 > X26: 00000000004bb330 X25: 0000000000000001 X24: 00000000004f8000 > X23: 0000000000000000 X22: 0000000000000002 X21: 000003ff9c041178 > X20: 000003ff955f0000 X19: 0000000000000002 X18: 0000000000000000 > X17: 000003ff9bf41218 X16: 0000000000000000 X15: 0016431976760606 > X14: 0000000000000000 X13: 0000000000000000 X12: 0000000000000001 > X11: 0000000000c48820 X10: 00000000004fd000 X9: 000003ffed468220 > X8: 0000000000000040 X7: 0000000000000001 X6: 000003ff9bf19a98 > X5: 0000000000000001 X4: 00000000fbad2a84 X3: 0000000000000000 > X2: 0000000000000002 X1: 000003ff955f0000 X0: 0000000000000001 > ORIG_X0: 0000000000000001 SYSCALLNO: 40 PSTATE: 20000000 > > Same thing if the kernel was entered via interrupt -- here the 4.5-based kernel: > > crash> bt -F 1 > ... [ cut ] ... > #6 [ffff8003dc103e70] sys_epoll_pwait at ffff8000002806c8 > ffff8003dc103e70: 0000000000000000 el0_svc_naked+36 <=== fp 0 > ffff8003dc103e80: 0000000000000000 0000000000000000 > ffff8003dc103e90: ffffffffffffffff 0000ffff9455a48c > ffff8003dc103ea0: 0000000060000000 0000000039000a80 > ffff8003dc103eb0: 0000000000000008 0000000000000000 > ffff8003dc103ec0: 0000000000000014 000000000ecb0180 > #7 [ffff8003dc103ed0] el0_svc_naked at ffff800000091a2c > PC: 0000ffff9455a48c LR: 0000aaaad308fb48 SP: 0000fffff45629b0 > X29: 0000fffff45629b0 X28: 0000fffff4562ec8 X27: 0000aaaad313f000 > X26: 0000aaaad30bfd70 X25: 0000aaaad313f000 X24: 0000aaaad30c0248 > X23: 0000000000000001 X22: 0000aaaaebf2ea50 X21: 0000fffff45629f0 > X20: 0000000000000000 X19: 0000000000000004 X18: 0000aaaad30e0050 > X17: 0000ffff9455a228 X16: 0000aaaad313f640 X15: 00000000000000d8 > X14: 00e53b496f58b0a8 X13: 0000000000000000 X12: 0000000000000010 > X11: 0000000000000067 X10: 0000000000000065 X9: 0000000000000074 > X8: 0000000000000016 X7: 000000000000cbf6 X6: 000000000003d090 > X5: 0000000000000008 X4: 0000000000000000 X3: ffffffffffffffff > X2: 0000000000000022 X1: 0000fffff45629f0 X0: 0000000000000004 > ORIG_X0: 0000000000000004 SYSCALLNO: 16 PSTATE: 60000000 > crash> > > And then here on the 3.19 kernel: > > crash> bt -F 1 > ... [ cut ] ... > #6 [fffffe035c083e80] sys_epoll_pwait at fffffe0000229bc8 > fffffe035c083e80: 000003ffe63d3610 el0_svc_naked+36 <=== fp 3ffe63d3610 > fffffe035c083e90: 0000000000000000 0000000000000000 > fffffe035c083ea0: ffffffffffffffff 000003ff7bc9ad4c > fffffe035c083eb0: 000003ffe63d3470 work_pending+28 > fffffe035c083ec0: 0000000000000000 0000000000000000 > #7 [fffffe035c083ed0] el0_svc_naked at fffffe000009396c > PC: 000003ff7bc9ad4c LR: 000003ff7c11711c SP: 000003ffe63d3610 > X29: 000003ffe63d3610 X28: 0000000000000001 X27: 000003ff7c21f000 > X26: 000003ff7c195618 X25: 000003ff7c195f88 X24: 000003ff7c1943e8 > X23: 000003ff8e21dc20 X22: 0000000000000000 X21: 000003ffe63d3780 > X20: 0000000000000000 X19: 0000000000000004 X18: 0000000000000800 > X17: 000003ff7bc9aae8 X16: 000003ff7c21f598 X15: 003b9aca00000000 > X14: 002ca73218000000 X13: ffffffffa90e8b35 X12: 0000000000000018 > X11: 00000000127f8e50 X10: 0000000000001102 X9: 000000000001f418 > X8: 0000000000000016 X7: 0000000000000000 X6: 000003ff7c090000 > X5: 0000000000000008 X4: 0000000000000000 X3: ffffffffffffffff > X2: 0000000000000001 X1: 000003ffe63d3780 X0: 0000000000000004 > ORIG_X0: 0000000000000004 SYSCALLNO: 16 PSTATE: 60000000 > > Anyway, that kernel anomoly caused a segmentation violation in your > arm64_gen_hidden_frame() function when it tries to use the user > space stack address in STACK_OFFSET_TYPE(). That turned out to be > my first attempt at maintenance of your patch, and while I've got > it to work without a SIGSEGV, my hack/kludge of a fix doesn't fit > very nicely into your arm64_back_trace_cmd_v2 for() loop. And > on the old kernel, "bt -f" does not display stack memory at and > above the fp/pc stackframe that contain the user-stack fp. Please > feel free to send a follow-up patch. > > Other changes I made to your backtrace method: > > (1) I removed the "--- <Exception in kernel> ---", "--- <Exception in user> ---", > lines. I felt that they are unnecessary given that that it's obvious which > is which, and no other arches print such lines. > (2) I removed the "#0 [user space]" line, which was meant for non-kdump kernels that > did not have a user-space exception frame available, so instead of printing > nothing for a task that was in user-space, it just showed that single line. > (3) I removed the empty linefeed after in-kernel exception frames so that each task's > backtrace is a single blurb without any intervening linefeeds. > > I did make a couple changes to the original backtrace method that > adopts code from your new optional method: > > (1) ORIG_X0 and SYSCALLNO registers are not displayed in kernel > exception frames. > (2) stackframe entry text locations are modified to be the PC address > of the branch instruction instead of the subsequent "return" PC > address. > > That being the case, these are the essential differences between > the original and optional methods: > > (1) original: the starting point of backtraces for the active, non-crashing, > tasks, will continue to have crash_save_cpu() on the IRQ stack as the > starting point; > optional: backtrace will start with the IPI exception frame located > on the process stack > (2) optional: exception entry stackframe adjusted to be located farther > down in the IRQ stack > (3) optional: bt -f does not display IRQ stack memory above the adjusted > exception entry stackframe > (4) optional: may display "(Next exception frame might be wrong)" > > The combined patchset is queued for crash-7.1.6: > > https://github.com/crash-utility/crash/commit/09fdac65d1ff47267d1ebbe74054f0da549c419b > > I appreciate all of your help. > > Thanks, > Dave > > > ----- Original Message ----- > > ----- Original Message ----- > > > > Takahiro, > > > > Sorry for the previous empty response -- I sent it by mistake while > > putting this 3-message collaborative response together. > > > > > Dave, > > > > > > If you don't like my patches, that is OK. > > > Applying them or not is totally up to you. > > > I will *never* submit my patches again. > > > > Well, that's certainly unfortunate. > > > > > Having said so, I think I would better explain my intentions on the code. > > > > > > On Wed, Jun 29, 2016 at 04:09:01PM -0400, Dave Anderson wrote: > > > > > > > > > > > > Hi Takahiro, > > > > > > > > I applied patches 1/2 and 2/2 from the v5 patchset. But I can't > > > > believe the results are what you intended? > > > > > > > > For example, taking the 4.6 vmcore that you gave to me, here is the > > > > current crash utility's output of "bt -a", where the crashing task > > > > entered crash_kexec() via the sysrq-c page fault exception, and the > > > > tasks on the other cpus have all entered crash_save_cpu() on their > > > > IRQ stack as a result of the shutdown IPI, one from user-space and > > > > the others from the kernel: > > > > > > > > crash> bt -a > > > > PID: 0 TASK: ffff000008dcd900 CPU: 0 COMMAND: "swapper/0" > > > > #0 [ffff800022f42e50] crash_save_cpu at ffff00000812ae44 > > > > #1 [ffff800022f43010] handle_IPI at ffff00000808e718 > > > > #2 [ffff800022f43040] gic_handle_irq at ffff0000080815f8 > > > > #3 [ffff800022f43080] el1_irq at ffff000008084720 > > > > --- <IRQ stack> --- > > > > > > This part of dump shown above is the result of "kdump" mechanisms, > > > and is *not* useful for you to understand what was happening on this > > > CPU when a crash occurred on the other CPU, #3 in this case. > > > > > > > PC: ffff0000080857c0 [arch_cpu_idle+16] > > > > LR: ffff0000080857bc [arch_cpu_idle+12] > > > > SP: ffff000008dc3f10 PSTATE: 60400149 > > > > X29: ffff000008dc3f10 X28: ffff000008dc0000 X27: 0000000000000000 > > > > X26: ffff000008dc5000 X25: ffff000008dc5c1c X24: ffff000008dc5000 > > > > X23: ffff000008dc0000 X22: ffff000008bd0270 X21: ffff000008dc0000 > > > > X20: ffff000008dc5b88 X19: 0000000000000000 X18: 00000000a632f641 > > > > X17: 0000ffff7da57880 X16: ffff0000081d9838 X15: 00000000383a0a79 > > > > X14: 00000000b2b0b162 X13: 000000005f2cbeec X12: 0000000000045a9e > > > > X11: ffff8000213bd800 X10: 0000000000000850 X9: ffff000008dc0000 > > > > X8: 000000010000aa07 X7: 000000000000003d X6: 0015752a00000000 > > > > X5: 0100000000000000 X4: 00000000000001c0 X3: 0000000000000000 > > > > X2: 0000000000000001 X1: 0000000000000000 X0: 0000000000000000 > > > > ORIG_X0: 0000000000000000 SYSCALLNO: 7fffffffffffffff > > > > #4 [ffff000008dc3f10] arch_cpu_idle at ffff0000080857c0 > > > > #5 [ffff000008dc3f20] cpu_startup_entry at ffff0000080f26cc > > > > #6 [ffff000008dc3f80] rest_init at ffff0000087c7930 > > > > #7 [ffff000008dc3fa0] start_kernel at ffff000008b10b70 > > > > > > All you need to know is that CPU#0 was idle at the crash point. > > > Whatever else do you want to know? > > > > Things like: > > > > (1) I want to know whether the active cpu was properly shut down by > > the kdump mechanism. > > (2) I would like to actually "test" that the crash utility's transition from > > the IRQ stack back to the process stack works as expected. (especially > > given > > the relative rarity of kernel crashes that actually occur while operating > > on IRQ stacks) > > (3) There are several other dumping mechanisms that can be utilized on ARM64 > > systems, so again, I'd like it to be obvious that this is kdump. > > (4) When doing a filtered "foreach bt" it would be helpful to simply make it > > obvious that a task was an active, non-crashing, task -- with more > > information > > than a "hanging" exception frame dump. > > (5) All of the other architectures do it. > > > > ... [ cut ] ... > > > > > > But why do you think that it is an improvement to leave out the > > > > transition > > > > to the IRQ stack? > > > > > > I explained the reason in my previous e-mail. > > > I still believe that it is an improvement. > > > > OK, we're just going to have to agree to disagree. > > > > > > In any case, I haven't even started looking at the "bt -f" part of the > > > > patch > > > > because quite frankly, this patchset is so complex that I haven't even > > > > begun to try to understand it. When you said it would be "easier to > > > > maintain", well, perhaps for *you* maybe, but certainly not for me! > > > > > > Do you think so? > > > It is because you are the author. > > > > Maybe so, but I absolutely do think so. > > > > > The changes I made were big and look complex to you, > > > but the resulting code is simple enough for others IMO. > > > > > > I tried to make the main loop of arm64_back_trace_cmd() simple and > > > quite resemble the counterpart of dump_backtrace() in the kernel. > > > > >From my perspective, the current arm64_back_trace_cmd() is far simpler, > > and is essentially a backport of the kernel's dump_backtrace() code. > > > > > The differences come mainly from the facts: > > > 1.you (crash util) need to display not only a one-line frame summary > > > (with a value of stack/frame pointer), but also a full dump of stack > > > > Right, but the current loop remains basically the same as the kernel's. > > > > > 2.your current code will miss a *very important* stack frame that is > > > the one exactly when an interrupt takes place. > > > > > > I believe that the latter is a big improvement. > > > > I'm not following when you say it "will miss" a frame. It's just following > > the links established by the architecture. If you line up the backtrace > > output > > of the current vs. your patch, there are no missing frames, but rather you > > essentially move the "el1_da()" stack frame entry to a lower stack address, > > and your "bt -f" implementation doesn't dump the raw exception frame data. > > > > > Yeah, due to the changes, arm64_unwind_frame() may get a bit complicated, > > > but it is very naive implementation based on the nature of the kernel's > > > stack usage (or PCS for ARM), the complexity is not the result of my poor > > > skill of coding. > > > In other words, the complexity is now *encapsulated* in that function. > > > > I understand what you're saying, and certainly appreciate the depth of > > knowledge that you have, but quite frankly I don't find the changes > > significant enough to reinvent the wheel. > > > > > In addition, I wrote down that function step-by-step, from generic cases > > > to exceptional cases, with a bunch of comments for better understandings. > > > And you will be able to remove such exceptional cases *if* you don't like > > > them. > > > > > > Given those stuffs, I think that my code is easier to maintain. > > > > > > Again, surely up to you. > > > > > > > Without the kernel's backtrace code from which the current code is based, > > > > I don't > > > > have anything to work from anymore. I'm really not sure whether the > > > > coverage of > > > > the "corner cases" you referred to make this effort worth it. I would > > > > like to > > > > see examples of how the current code fails. But anyway, I will continue > > > > to test it to > > > > see if there actually is any significant upgrade from what we already > > > > have in > > > > place. From a kernel debugging perspective, all we really need is a > > > > basic unwinder, > > > > and optionally the full dump of the stack data in between those frames. > > > > And the > > > > current code does do that at a minimum, and there's much to be said for > > > > simplicity. > > > > > > > > ... [ cut ] ... > > > > > > Here is another thing that I would prefer not to change/omit. > > > > > > > > In the current code, the raw exception frame data is dumped as > > > > part of the "bt -[fF]" output, just prior to it being translated > > > > as an exception frame: > > > > > > > > crash> bt -F > > > > PID: 1223 TASK: ffff800020ef5780 CPU: 3 COMMAND: "sh" > > > > ... [ cut ] ... > > > > #5 [ffff800020b0bb70] do_mem_abort at ffff00000808128c > > > > ffff800020b0bb70: ffff800020b0bd40 el1_da+24 > > > > ffff800020b0bb80: cpu_cgrp_subsys+152 0000000000000063 > > > > ffff800020b0bb90: ffff800020b0bd40 sysrq_handle_crash+32 > > > > ffff800020b0bba0: 0000000000000002 textbuf.34610 > > > > ffff800020b0bbb0: ffff800020b0bbd0 kallsyms_token_index+43128 > > > > ffff800020b0bbc0: 000000000000000f 0000000100000000 > > > > ffff800020b0bbd0: ffff800020b0bc70 vprintk_default+56 > > > > ffff800020b0bbe0: cpu_cgrp_subsys+152 0000000000000063 > > > > ffff800020b0bbf0: sysrq_crash_op 0000000000000009 > > > > ffff800020b0bc00: 0000000000000000 0000000000000015 > > > > ffff800020b0bc10: 0000000000000120 0000000000000040 > > > > ffff800020b0bc20: 0000000000000001 0000000000000000 > > > > ffff800020b0bc30: log_wait+8 0000000000000000 > > > > ffff800020b0bc40: 0000000000000000 00000000000047d4 > > > > ffff800020b0bc50: ffff800022f337a4 0000000000000000 > > > > ffff800020b0bc60: 0000000000000106 0000000000000001 > > > > ffff800020b0bc70: 0000000000000002 0000000000000106 > > > > ffff800020b0bc80: log_buf_len cont > > > > ffff800020b0bc90: 0000ffff83cc28f0 text.34829+13 > > > > ffff800020b0bca0: sys_write 0000ffff83d266c0 > > > > ffff800020b0bcb0: 0000000000000006 cpu_cgrp_subsys+152 > > > > ffff800020b0bcc0: 0000000000000063 sysrq_crash_op > > > > ffff800020b0bcd0: 0000000000000009 0000000000000000 > > > > ffff800020b0bce0: 0000000000000015 0000000000000120 > > > > ffff800020b0bcf0: 0000000000000040 sys_call_table > > > > ffff800020b0bd00: ffff800020b08000 ffff800020b0bd40 > > > > ffff800020b0bd10: sysrq_handle_crash+12 ffff800020b0bd40 > > > > ffff800020b0bd20: sysrq_handle_crash+32 0000000060400149 > > > > ffff800020b0bd30: cpu_cgrp_subsys+152 [kmalloc-1024] > > > > #6 [ffff800020b0bd40] el1_da at ffff000008084568 > > > > PC: ffff000008457fc8 [sysrq_handle_crash+32] > > > > LR: ffff000008457fb4 [sysrq_handle_crash+12] > > > > SP: ffff800020b0bd40 PSTATE: 60400149 > > > > X29: ffff800020b0bd40 X28: ffff800020b08000 X27: ffff0000087e2000 > > > > X26: 0000000000000040 X25: 0000000000000120 X24: 0000000000000015 > > > > X23: 0000000000000000 X22: 0000000000000009 X21: ffff000008e071b0 > > > > X20: 0000000000000063 X19: ffff000008dda000 X18: 0000000000000006 > > > > X17: 0000ffff83d266c0 X16: ffff0000081c68b8 X15: ffff000008e6cc95 > > > > X14: 0000ffff83cc28f0 X13: ffff000008e6c758 X12: ffff000008dda7a0 > > > > X11: 0000000000000106 X10: 0000000000000002 X9: 0000000000000001 > > > > X8: 0000000000000106 X7: 0000000000000000 X6: ffff800022f337a4 > > > > X5: 00000000000047d4 X4: 0000000000000000 X3: 0000000000000000 > > > > X2: ffff000008dda7b8 X1: 0000000000000000 X0: 0000000000000001 > > > > ORIG_X0: ffff000008dda000 SYSCALLNO: ffff80002104d418 > > > > ... > > > > > > > > whereas with the v5 patchset, the exception frame only gets translated, > > > > but the actual raw memory never gets dumped: > > > > > > I surely remember that you said that would not be an issue > > > when I submitted older version, maybe v1 or v2. > > > > Well, yes, that's true in the case of an exception frame generated > > by the IPI interrupt (or any interrupt), where the eframe contents are > > incidental to the interrupt's arrival, and not nearly as interesting > > as an in-line kernel exception frame generated by a NULL pointer > > reference, BUG(), etc., where the contents of the exception frame > > are completely related to the problem/bug at hand. And in that case, > > the current code does dump the raw contents just before the register > > translation. > > > > > Do you think that those symbolic display are still useful > > > though it is not quite easy to recognize which register has what value? > > > > It's not a major item, but it certainly just dumps the raw memory > > in between frame like anywhere else in the "bt -f" output. Seems > > strange to just omit it there. > > > > > > > > Even more, <ffff800020b0bb80-ffff800020b0bc10> is *not* a stack for > > > do_mem_abort(). It is just wrong and will confuse people. > > > So this is another example of improvement on my patches. > > > > I'm not following -- your patch (as well as the current version) both > > show ffff800020b0bb80-ffff800020b0bc10 below do_mem_abort(): > > > > ... > > #5 [ffff800020b0bb70] do_mem_abort at ffff000008081288 > > ffff800020b0bb70: ffff800020b0bd40 el1_da+24 > > ffff800020b0bb80: cpu_cgrp_subsys+152 0000000000000063 > > ffff800020b0bb90: ffff800020b0bd40 sysrq_handle_crash+32 > > ffff800020b0bba0: 0000000000000002 textbuf.34610 > > ffff800020b0bbb0: ffff800020b0bbd0 kallsyms_token_index+43128 > > ffff800020b0bbc0: 000000000000000f 0000000100000000 > > ffff800020b0bbd0: ffff800020b0bc70 vprintk_default+56 > > ffff800020b0bbe0: cpu_cgrp_subsys+152 0000000000000063 > > ffff800020b0bbf0: sysrq_crash_op 0000000000000009 > > ffff800020b0bc00: 0000000000000000 0000000000000015 > > ffff800020b0bc10: 0000000000000120 0000000000000040 > > #6 [ffff800020b0bc20] el1_da at ffff000008084564 > > --- <Exception in kernel> --- > > PC: ffff000008457fc8 [sysrq_handle_crash+32] > > LR: ffff000008457fb4 [sysrq_handle_crash+12] > > SP: ffff800020b0bd40 PSTATE: 60400149 > > ... > > > > The current code does the above, but dumps the raw exception frame > > before printing a stack entry for el1_da(): > > > > ... > > #5 [ffff800020b0bb70] do_mem_abort at ffff00000808128c > > ffff800020b0bb70: ffff800020b0bd40 el1_da+24 > > ffff800020b0bb80: cpu_cgrp_subsys+152 0000000000000063 > > ffff800020b0bb90: ffff800020b0bd40 sysrq_handle_crash+32 > > ffff800020b0bba0: 0000000000000002 textbuf.34610 > > ffff800020b0bbb0: ffff800020b0bbd0 kallsyms_token_index+43128 > > ffff800020b0bbc0: 000000000000000f 0000000100000000 > > ffff800020b0bbd0: ffff800020b0bc70 vprintk_default+56 > > ffff800020b0bbe0: cpu_cgrp_subsys+152 0000000000000063 > > ffff800020b0bbf0: sysrq_crash_op 0000000000000009 > > ffff800020b0bc00: 0000000000000000 0000000000000015 > > ffff800020b0bc10: 0000000000000120 0000000000000040 > > ffff800020b0bc20: 0000000000000001 0000000000000000 > > ffff800020b0bc30: log_wait+8 0000000000000000 > > ffff800020b0bc40: 0000000000000000 00000000000047d4 > > ffff800020b0bc50: ffff800022f337a4 0000000000000000 > > ffff800020b0bc60: 0000000000000106 0000000000000001 > > ffff800020b0bc70: 0000000000000002 0000000000000106 > > ffff800020b0bc80: log_buf_len cont > > ffff800020b0bc90: 0000ffff83cc28f0 text.34829+13 > > ffff800020b0bca0: sys_write 0000ffff83d266c0 > > ffff800020b0bcb0: 0000000000000006 cpu_cgrp_subsys+152 > > ffff800020b0bcc0: 0000000000000063 sysrq_crash_op > > ffff800020b0bcd0: 0000000000000009 0000000000000000 > > ffff800020b0bce0: 0000000000000015 0000000000000120 > > ffff800020b0bcf0: 0000000000000040 sys_call_table > > ffff800020b0bd00: ffff800020b08000 ffff800020b0bd40 > > ffff800020b0bd10: sysrq_handle_crash+12 ffff800020b0bd40 > > ffff800020b0bd20: sysrq_handle_crash+32 0000000060400149 > > ffff800020b0bd30: cpu_cgrp_subsys+152 [kmalloc-1024] > > #6 [ffff800020b0bd40] el1_da at ffff000008084568 > > PC: ffff000008457fc8 [sysrq_handle_crash+32] > > LR: ffff000008457fb4 [sysrq_handle_crash+12] > > SP: ffff800020b0bd40 PSTATE: 60400149 > > ... > > > > Yes, again I understand that you feel that the el1_da() frame > > should be at ffff800020b0bc20, but on the other hand, the > > link register/pc under do_mem_abort() points to ffff800020b0bd40, > > and I kind of prefer to show that. I *know* you disagree. > > > > ... [ cut ] ... > > > > > IMO I think that you'd better improve the output of > > > arm64_print_exception_frame() for bt -F. > > > And this is an totally independent issue from other parts of my patches. > > > > I don't see where "bt -F" gets involved in arm64_print_exception_frame()? > > Are you referring to this, which I agree is OK: > > > > @@ -2130,10 +2343,11 @@ arm64_print_exception_frame(struct bt_info *bt, ulong > > pt_regs, int mode, FILE *o > > } > > > > if (is_64_bit) { > > - fprintf(ofp, "ORIG_X0: %016lx SYSCALLNO: %lx", > > - (ulong)regs->orig_x0, (ulong)regs->syscallno); > > - if (mode == USER_MODE) > > + if (mode == USER_MODE) { > > + fprintf(ofp, "ORIG_X0: %016lx SYSCALLNO: %lx", > > + (ulong)regs->orig_x0, > > (ulong)regs->syscallno); > > fprintf(ofp, " PSTATE: %08lx", (ulong)regs->pstate); > > + } > > fprintf(ofp, "\n"); > > } > > > > Anyway, I'm going to be away for the next few days, and will revisit > > this all again next week. > > > > But here's a thought: > > > > There already are "bt -o" and "bt -O" for x86 and x86_64, which allow > > for the existence of coexisting but separate back trace methods. > > I'm thinking that we can keep the current code, while also introducing > > your patchset as an "optional" back trace method. There would be > > arm64_back_trace_cmd_v1() and arm64_back_trace_cmd_v2(), > > arm64_unwind_frame_v1() > > and arm64_unwind_frame_v2(), and any other functions that can't comfortably > > be utilized by both methods. And then "bt -o" would use the optional > > version, or the default method selection could be toggled with "bt -O". > > > > And again, I hope we don't lose you. During the KASLR negotiations, you've > > already seen what an obstinate ass I can be. But it's really part of my > > job as a gatekeeper. > > > > Thanks, > > Dave > > > > > > "Simplicity, simplicity, simplicity." > > - Henry David Thoreau > > > > -- > > Crash-utility mailing list > > Crash-utility@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/crash-utility > > > > -- > Crash-utility mailing list > Crash-utility@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/crash-utility -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility