On Mon, Oct 18, 2010 at 09:56:41AM -0400, Dave Anderson wrote: > > ----- "Dave Anderson" <anderson@xxxxxxxxxx> wrote: > > > ----- "Hu Tao" <hutao@xxxxxxxxxxxxxx> wrote: > > > > > Hi Dave, > > > > > > There is a bug on get_be_long() that causes high 32 bits truncated. > > > As a result, we get wrong registers values from dump file. Patch 1 > > > fixes this. > > > > Good catch! > > > > > Once we can get right cpu registers values, it's better to use the > > > sp/ip for backtracing the active task. This can show a more accurate > > > backtrace, not including those invalid frames beyond sp. pathes 2 and > > > 3 do this on kvmdump case(virsh dump). > > > > > > To verify: run that km_probe.c test module on a x86_64 system, then > > > `echo q > /proc/sysrq-trigger' to trigger the kprobe which does > > > looping in post_handler. Then vrish dump then crash. > > > > However, I'm wondering whether you actually tested this with a > > *crashed* system's dumpfile, and not just with a *live* system dump with > > a contrived set of circumstances. And if so, what differences, if any, > > did you see with the backtraces of the task that either oops'd or called > > panic(), as well as those of the other active tasks that were brought down > > by IP interrupt? > > > > Anyway, I'll give this a thorough testing with a set of sample > > dumpfiles that I have on hand. > > Actually, the patch fails miserably on SMP dumpfiles with segmentation > during initialization. > > And now looking at your patch, I'm wondering whether you even tested this > with an SMP system? My fault. I will do more tests and improve the patch. * test with a SMP system * test with a live dump * test with system panic/oops * test with x86/x86_64 > > The change to qemu_load() here: > > @@ -904,6 +906,9 @@ qemu_load (const struct qemu_device_loader *devices, uint32_t required_features, > if (feof (fp) || ferror (fp)) > break; > > + if (STREQ(d->vtbl->name, "cpu")) > + result->dx86 = d; > + > if (sec == QEMU_VM_SECTION_END || sec == QEMU_VM_SECTION_FULL) > result->features |= features; > } > > That function cycles through the "cpu" devices for *each* cpu > in the system, so this patch will store the device of the last cpu > device it encounters. So in an SMP machine, it will store the > device for the highest cpu only, right? > > And then there's this change to get_kvmdump_regs(): > > @@ -310,7 +311,11 @@ kvmdump_memory_dump(FILE *ofp) > void > get_kvmdump_regs(struct bt_info *bt, ulong *pc, ulong *sp) > { > - machdep->get_stack_frame(bt, pc, sp); > + if (is_task_active(bt->task)) { > + *sp = device_list->dx86->regs[R_ESP]; > + *pc = device_list->dx86->eip; > + } else > + machdep->get_stack_frame(bt, pc, sp); > } > > is_task_active() returns TRUE for all active tasks in an SMP system, > not just the panic task. So it would seem you're going to pass > back the same registers for all active tasks? > > And what's the point of this change to kernel.c? > > diff --git a/kernel.c b/kernel.c > index e399099..2627020 100755 > --- a/kernel.c > +++ b/kernel.c > @@ -16,6 +16,7 @@ > */ > > #include "defs.h" > +#include "qemu-load.h" > #include "xen_hyper_defs.h" > #include <elf.h> > > Also, the change to main.c is unnecessary -- there are dozens of malloc'd > memory areas in the program -- so why go to the bother of free'ing > just this one prior to exiting? > > Anyway, instead of saving the device list, I suggest you do something > like storing the per-cpu IP/SP values in a separate data structure that > can possibly be used as an alternative source for register values for > "live dumps" -- and possibly for crashing systems if usable starting > hooks cannot be determined in the traditional manner. I had thought > of doing something like that in the past, but when I looked at the > register values, I must have run into the get_be_long() issue? Sounds reasonable. Thanks for suggestions. -- Thanks, Hu Tao -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility