Re: Unable to switch stack frames while using crash

Dave Anderson <anderson@xxxxxxxxxx> · Thu, 23 Jun 2011 11:39:28 -0400 (EDT)

----- Original Message -----

> After analysis, we figured out that the crash occurs in the function
> n_read_tty of kernel-source/drivers/char/n_tty.c . The oops occurred on
> linux kernel 2.6.32. Below is the code fragment where the page fault
> occurred. The page fault occurs when executing the statement c =
> tty->read_buf[tty->read_tail] .
> 
> /* N.B. avoid overrun if nr == 0 */
> while (nr && tty->read_cnt) {
> 
> int eol;
> 
> eol = test_and_clear_bit(tty->read_tail, tty->read_flags);
> c = tty->read_buf[tty->read_tail]; //
> page fault statement after analyzing oops

BTW, are you sure about that?  

Presuming that the "tty" pointer is ffff8802cbd54800 as you've shown below,
and therefore tty->read_buf is 0xffff8802cbfe6000 and tty->read_tail is 0,
then the statement above would be simply be reading tty->read_buf[0], or
virtual address 0xffff8802cbfe6000.  But the oops shows it faulting on a
virtual address of "5":

BUG: unable to handle kernel NULL pointer dereference at 0000000000000005

Dave

> 
> Below is the contents of the structure tty_struct ( at the time of
> oops
> ). This was passed as an argument to the function n_read_tty().
> 
> tty_struct ffff8802cbd54800
> struct tty_struct { ...
> magic = 21505,
> driver = 0xffff88031b54ea00,
> ops = 0xffffffff8130f650,
> name = "pts9\000\...",
> driver_data = 0xffff88029c8a9668,
> icanon = 1 '\001',
> read_buf = 0xffff8802cbfe6000 "",
> read_head = 0,
> read_tail = 0,
> read_cnt = 0,
> read_flags = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
> canon_data = 0,
> ......................................
> 
> As per crash utility the field read_cnt is 0 when kernel oopsed.In
> that
> case, the statement while (nr && tty->read_cnt) in the above code
> fragment should have failed. This leads me to think that there was
> some
> other thread/task in kernel which should have updated the read_cnt
> field in parallel. However the crash utility reports that the runqueue
> of all CPUs at the time of crash as idle. Except CPU1 which was
> executing the user program telnet in kernel context ( system call ).
> Below is the runqueue output.
> 
> CPU 0 RUNQUEUE: ffff880033012d80
> CURRENT: PID: 0 TASK: ffffffff814204b0 COMMAND: "swapper"
> RT PRIO_ARRAY: ffff880033012e98
> [no tasks queued]
> CFS RB_ROOT: ffff880033012e10
> [no tasks queued]
> 
> CPU 1 RUNQUEUE: ffff880033032d80
> CURRENT: PID: 13366 TASK: ffff88031b60d580 COMMAND: "telnet"
> RT PRIO_ARRAY: ffff880033032e98
> [no tasks queued]
> CFS RB_ROOT: ffff880033032e10
> [no tasks queued]
> 
> CPU 2 RUNQUEUE: ffff880033052d80
> CURRENT: PID: 0 TASK: ffff88031e0e3540 COMMAND: "swapper"
> RT PRIO_ARRAY: ffff880033052e98
> [no tasks queued]
> CFS RB_ROOT: ffff880033052e10
> [no tasks queued]
> 
> CPU 3 RUNQUEUE: ffff880033072d80
> CURRENT: PID: 0 TASK: ffff88031e113580 COMMAND: "swapper"
> RT PRIO_ARRAY: ffff880033072e98
> [no tasks queued]
> CFS RB_ROOT: ffff880033072e10
> [no tasks queued]
> 
> 
> How is this logically possible. Crash reports there are no tasks
> running
> currently. Or before the oops trigger and kdump capturing the memory
> image, some process/thread ran which could have updated the data
> structure. I wanted to know if this scenario is possible. I kindly
> request your suggestion/guidance. Please let me know if you need any
> other details.
> 
> 
> Regards
> Shashidhara
> 
> 
> -----Original Message-----
> From: crash-utility-bounces@xxxxxxxxxx
> [mailto:crash-utility-bounces@xxxxxxxxxx] On Behalf Of Dave Anderson
> Sent: Tuesday, June 21, 2011 7:24 PM
> To: Discussion list for crash utility usage,maintenance and
> development
> Subject: Re:  Unable to switch stack frames while using
> crash
> 
> 
> 
> ----- Original Message -----
> > Hi Dave,
> >
> > I updated the makedumpfile utility from 1.3.5 to 1.3.7 . When I run
> the
> > below command
> >
> > makedumpfile -c -d 31 -x vmlinux_temp vmcore vmcore-new
> > The kernel version is not supported.
> > The created dumpfile may be incomplete.
> > check_release: Can't get the kernel version.
> > makedumpfile Failed.
> 
> I see that makedumpfile-1.3.8 was recently released, but it still
> has a LATEST_VERSION of 2.6.36:
> 
> #define OLDEST_VERSION KERNEL_VERSION(2, 6, 15)/*
> linux-2.6.15 */
> #define LATEST_VERSION KERNEL_VERSION(2, 6, 36)/*
> linux-2.6.36 */
> 
> You haven't stated what your kernel version is, but it seems
> makedumpfile
> cannot get past this point. On the other hand, the compressed kdump
> was
> created, so I'm not entirely clear.
> 
> > Is there any other way to extract the ELF style vmcore file from the
> > kdump compressed format. Please guide me.
> 
> I don't believe so...
> 
> But I'm not the makedumpfile maintainer, so I'd prefer not to give any
> definitive answers to your questions. I've cc'd the upstream
> maintainer
> of makedumpfile.
> 
> Thanks,
> Dave
> 
> --
> Crash-utility mailing list
> Crash-utility@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/crash-utility
> 
> Information transmitted by this e-mail is proprietary to MphasiS, its
> associated companies and/ or its customers and is intended
> for use only by the individual or entity to which it is addressed, and
> may contain information that is privileged, confidential or
> exempt from disclosure under applicable law. If you are not the
> intended recipient or it appears that this mail has been forwarded
> to you without proper authority, you are notified that any use or
> dissemination of this information in any manner is strictly
> prohibited. In such cases, please notify us immediately at
> mailmaster@xxxxxxxxxxx and delete this mail from your records.
> 
> 
> --
> Crash-utility mailing list
> Crash-utility@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/crash-utility

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility