Re: crash 4.0-8.9 w/ 2.6.30-rc6

Dave Anderson <anderson@xxxxxxxxxx> · Wed, 27 May 2009 09:23:44 -0400 (EDT)

----- "Mike Snitzer" <snitzer@xxxxxxxxxx> wrote:

> On Wed, May 27 2009 at  8:37am -0400,
> Dave Anderson <anderson@xxxxxxxxxx> wrote:
> 
> > 
> > ----- "Mike Snitzer" <snitzer@xxxxxxxxxx> wrote:
> > 
> > > Hi Dave,
> > > 
> > > crash is failing with the following when I try to throw a
> 2.6.30-rc6
> > > vmcore at it:
> > > 
> > > crash: invalid structure size: x8664_pda
> > >        FILE: x86_64.c  LINE: 584  FUNCTION: x86_64_cpu_pda_init()
> > > 
> > > [/usr/bin/crash] error trace: 449c7f => 4ce815 => 4d00cf =>
> 50936d
> > > 
> > >   50936d: SIZE_verify+168
> > >   4d00cf: (undetermined)
> > >   4ce815: x86_64_init+3205
> > >   449c7f: main_loop+152
> > > 
> > > I can dig deeper but your help would be very much appreciated.
> > > 
> > > Mike
> > 
> > The venerable "been-there-since-the-beginning-of-x86_64" x8664_pda
> > data structure no longer exists.  It was a per-cpu array of a
> fundamental
> > data structure that things like "current", the per-cpu magic number,
> the
> > cpu number, the current kernel stack pointer, the per-cpu IRQ stack
> pointer,
> > etc. all came from:  
> > 
> > /* Per processor datastructure. %gs points to it while the kernel
> runs */
> > struct x8664_pda {
> >         struct task_struct *pcurrent;   /* Current process */
> >         unsigned long data_offset;      /* Per cpu data offset from
> linker address */
> >         unsigned long kernelstack;  /* top of kernel stack for
> current */
> >         unsigned long oldrsp;       /* user rsp for system call */
> > #if DEBUG_STKSZ > EXCEPTION_STKSZ
> >         unsigned long debugstack;   /* #DB/#BP stack. */
> > #endif
> >         int irqcount;               /* Irq nesting counter. Starts
> with -1 */
> >         int cpunumber;              /* Logical CPU number */
> >         char *irqstackptr;      /* top of irqstack */
> >         int nodenumber;             /* number of current node */
> >         unsigned int __softirq_pending;
> >         unsigned int __nmi_count;       /* number of NMI on this
> CPUs */
> >         int mmu_state;
> >         struct mm_struct *active_mm;
> >         unsigned apic_timer_irqs;
> > } ____cacheline_aligned_in_smp;
> > 
> > There have been upstream rumblings about replacing it with a more efficient
> > per-cpu implementation for some time now, but I haven't studied how the new
> > scheme works yet.  It will be a major re-work for the crash utility, so you're
> > pretty much out of luck for now.  (Try "gdb vmlinux vmcore" for basic info)
> 
> Ah OK.  I was just looking to get a stack trace.  Unfortunately gdb
> isn't playing nice either:
> 
> (gdb) bt
> #0  kstat_irqs_cpu (irq=<value optimized out>, cpu=2) at
> kernel/irq/handle.c:555
> Cannot access memory at address 0xffff88007e5e7d50

Mike,

Try the "--minimal" option that the IBM guys put into 4.0-7.1:

         - Implementation of a "--minimal" command line option, which brings 
           up a crash session that is restricted to the "log", "dis", "rd", 
           "sym", "eval" and "exit" commands.  This option may provide a way to 
           extract some minimal/quick information from a corrupted or truncated 
           dumpfile, or in situations where one of the several kernel subsystem 
           initialization routines, which are not called, would abort the
           crash session.  (sharyath@xxxxxxxxxx, sachinp@xxxxxxxxxx)

So just enter this:

 $ crash --minimal vmlinux vmcore

And you should at least get the kernel trace info with the "log" command.

Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility