Castor Fu wrote: > On Wed, 9 Nov 2005, Dave Anderson wrote: > > > Castor Fu wrote: > > > >> Hi guys: > >> > >> We're trying to bring in a newer version of crash, and have > >> some changes which we've made since going to version 4.0-2.10 > >> which I'll outline below, and have attached a patch: > >> > >> 1. Makefile, help.c, extensions.c, new file kw.c > >> Add a keyword expansion mechanism to the 'extend' command. > >> We did this primarily to let us load different extensions > >> based on the dump file itself or the version of crash. > >> > >> +"\n The named <shared-object> is subjected to a simple keyword expansion", > >> +" scheme with the following rules:", > >> +" $(crash_version) version of crash", > >> +" $(corestr-<varname>) string named <varname> taken from the core file", > >> +" $(builddir) build directory for the kernel", > > > > I still have no idea what this does? Can you expand upon how this works, > > what does it do, what does it looks like, can it be ignored, etc? Couldn't > > this kind of thing can be self-contained in each extension module? > > It just seems to be way too "implementation-specific" for lack of a > > better term? > > Consider a set of extension commands which rely on some data structure > which tends to change from version to version of say, a device driver. > The extension code could, with a great deal more work, be written > to extract structure offsets, etc. from the debugging information, > but it's much easier to just '#include' the header file. > > This way, one could have something like > > extend /usr/lib/crash/fc$(corestr-fc_version).o > > in one's .crashrc, and the appropriate extensions could be loaded without > the person analyzing the crash having to figure out which version it came from > right away. > Wow. You guys have gone way beyond whatever I could have imagined for extensions! I'm going to have to let this one stew for a while... > > [ Along these lines, I'm trying to look at a RHEL diskdump and I must > be missing the point about how one pulls together the modules to > look at the dump. I have the rpm kernel-debuginfo-2.6.9-22.EL loaded, and all the > kernel modules have .debug appended on the filename so 'mod -S' doesn't work. ] Not sure why it doesn't work for you. The module.ko files come with the base kernel package. The debuginfo package just supplements them with debug data. > > > One could also implement this with an expect wrapper which would give one much > more generality, but then everyone is going to end up with their own little > wrapper.... > > ... > > >> 6. task.c > >> Be more defensive about the value of tc->processor > >> > > > > Why is it necessary to do a verify_task() again -- it had to > > have gone through verify_task() in order to get here, right? > > What exactly did you see that required this? > > We've seen dumps where the task structures get corrupted by > other bad pointers, and crash tended to just give up the ghost > because of dereferencing tc->processor which would get > set to some ridiculous value. > > I think I've seen errors at both these spots, but one or the > other may have been from searching through the code for tc->processor. > Ok -- I figured that it was due to some unnatural dumpfile. Anyway, it's harmless, and if it works in those cases, that's fine. > > > > > --- crash-4.0-2.10/task.c 2005-11-07 07:44:06 -08:00 > > +++ crash-4.0-2.10-new/task.c 2005-11-07 18:57:19 -08:00 > > @@ -2942,7 +2942,7 @@ > > fprintf(fp, "(SYSRQ)"); > > else if (machdep->flags & INIT) > > fprintf(fp, "(INIT)"); > > - else if (kt->cpu_flags[tc->processor] & NMI) > > + else if (verify_task(tc, 2) && kt->cpu_flags[tc->processor] & NMI) > > fprintf(fp, "(NMI)"); > > else if (tc->task == tt->panic_task) > > fprintf(fp, "(PANIC)"); > > > > This I'm presuming is related to the above, but again, what did you > > see that requires this?: > > > > @@ -4337,7 +4340,8 @@ > > > > tc = FIRST_CONTEXT(); > > for (i = 0; i < RUNNING_TASKS(); i++, tc++) { > > - if (task_has_cpu(tc->task, NULL)) { > > + if (task_has_cpu(tc->task, NULL) > > + && tc->processor >= 0 && tc->processor < NR_CPUS) { > > tt->panic_threads[tc->processor] = tc->task; > > found++; > > } > > > > > >> Also, try to set the panic string correctly in when decoding dump. > > > > This is LKCD specific anyway, right? > > I don't think so. in get_panicmsg() it looks like there's code which > is trying to get the panicmsg from either tt_panicmsg or LKCD. > > This string will get overwritten by looking through the msgbuf... > Yeah -- I should have mentioned that the get_panicmsg() code is only satisfied by ancient MCLX-style dumpfiles. I'm pretty sure nobody still has any of those... So it is LKCD only, which is fine. > > >> 7. x86.c > >> > >> Be a little more liberal deciding that the frame pointer has been included. > >> gcc 3.x can re-schedule code so that instead of > >> > >> push bp > >> mov esp, bp > >> > >> we sometimes see > >> > >> push bp > >> xor eax,eax > >> move esp, bp > >> > >> so detect both sequences. > > > > This looks reasonable. Didn't think anybody built kernels that way anymore... > > You mean nobody builds with frame pointers anymore? I guess we do because it's easier > to debug. Having 'bt' pull out the arguments on the stack is nice. If we can get > similar functionality without frame pointers, we'd be happy to chuck them... > > -castor Thanks, Dave