Hi Dave, Thanks for the help. I have some doubts regarding kdump and crash utility. I am analyzing a vmcore dump caused by an oops in customer location using crash utility.The oops report is below [345132.723424] BUG: unable to handle kernel NULL pointer dereference at 0000000000000005 [345132.724928] IP: [<ffffffff811f03b3>] n_tty_read+0x58c/0x818 [345132.726100] PGD 2c8e03067 PUD 2cbd88067 PMD 0 [345132.727187] Oops: 0000 [#1] SMP [345132.727879] last sysfs file: /sys/block/loop7/dev [345132.728935] CPU 1 [345132.729396] Modules linked in: xt_tcpudp iptable_filter ip_tables x_tables strmfs_mod bond0 ipmi_devintf hpwdt sctp ipv6 crc32c libcrc32c loop ipmi_si tpm_tis ipmi_msghandler hpilo tpm tpm_bios psmouse serio_raw shpchp pci_hotplug container processor evdev ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod sg sr_mod cdrom ide_pci_generic ide_core usbhid hid ata_piix ata_generic libata ehci_hcd bnx2 uhci_hcd e1000e cciss scsi_mod button thermal fan thermal_sys edd [last unloaded: scsi_wait_scan] [345132.739511] Pid: 13366, comm: telnet Not tainted 2.6.32-cdma-18-amd64 #1 ProLiant DL380 G6 [345132.741423] RIP: 0010:[<ffffffff811f03b3>] [<ffffffff811f03b3>] n_tty_read+0x58c/0x818 [345132.743220] RSP: 0018:ffff88031ce75da8 EFLAGS: 00010246 [345132.744469] RAX: 0000000000000000 RBX: ffff8802cbd54a68 RCX: 000000000061c044 [345132.746061] RDX: 0000000000000005 RSI: ffff88031ce75e87 RDI: ffff8802cbd54d1c [345132.747726] RBP: ffff88031ce75eb8 R08: 0000000000000000 R09: 0000000000000000 [345132.749391] R10: 0000000000616680 R11: 0000000000000246 R12: 000000000061c044 [345132.750981] R13: ffff8802cbd54800 R14: 0000000000000000 R15: 7fffffffffffffff [345132.752650] FS: 00007ffff7fee6f0(0000) GS:ffff880033020000(0000) knlGS:0000000000000000 [345132.754569] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [345132.755915] CR2: 0000000000000005 CR3: 000000030c408000 CR4: 00000000000006e0 [345132.757579] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [345132.759169] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [345132.760778] Process telnet (pid: 13366, threadinfo ffff88031ce74000, task ffff88031b60d580) [345132.762707] Stack: [345132.763162] ffff88031b60d580 ffff88031b60d580 ffff88031b60d580 ffff88031b60d580 [345132.764791] <0> 000000000061c02b 0000000000000000 0000000000000000 000000000061c02a [345132.766510] <0> ffff8802de651a40 ffff8802cbd549c0 ffff8802cbd54c90 ffff8802cbd54d1c [345132.768270] Call Trace: [345132.768877] [<ffffffff81045f84>] ? default_wake_function+0x0/0xf [345132.770309] [<ffffffff811ebf7e>] tty_read+0x7d/0xba [345132.771526] [<ffffffff810ebcc8>] vfs_read+0xab/0x167 [345132.772541] [<ffffffff810ebe48>] sys_read+0x47/0x6f [345132.773526] [<ffffffff8100bbc2>] system_call_fastpath+0x16/0x1b [345132.774652] Code: 00 41 8b 85 5c 02 00 00 48 8b 9d 78 ff ff ff f0 0f b3 03 45 19 f6 49 63 95 5c 02 00 00 49 8b 85 50 02 00 00 48 8b bd 48 ff ff ff <0f> be 1c 10 e8 fc 6b 0e 00 48 89 c6 41 8b 85 5c 02 00 00 41 ff [345132.778840] RIP [<ffffffff811f03b3>] n_tty_read+0x58c/0x818 [345132.780107] RSP <ffff88031ce75da8> [345132.780969] CR2: 0000000000000005 [345132.781786] hpwdt: New timer passed in is 120 seconds. [345132.782942] hpwdt: timer reset to 120 for kdump After analysis, we figured out that the crash occurs in the function n_read_tty of kernel-source/drivers/char/n_tty.c . The oops occurred on linux kernel 2.6.32. Below is the code fragment where the page fault occurred. The page fault occurs when executing the statement c = tty->read_buf[tty->read_tail] . /* N.B. avoid overrun if nr == 0 */ while (nr && tty->read_cnt) { int eol; eol = test_and_clear_bit(tty->read_tail, tty->read_flags); c = tty->read_buf[tty->read_tail]; // page fault statement after analyzing oops spin_lock_irqsave(&tty->read_lock, flags); tty->read_tail = ((tty->read_tail+1) & (N_TTY_BUF_SIZE-1)); tty->read_cnt--; if (eol) { /* this test should be redundant: * we shouldn't be reading data if * canon_data is 0 */ if (--tty->canon_data < 0) tty->canon_data = 0; } spin_unlock_irqrestore(&tty->read_lock, flags); Below is the contents of the structure tty_struct ( at the time of oops ). This was passed as an argument to the function n_read_tty(). tty_struct ffff8802cbd54800 struct tty_struct { ... magic = 21505, driver = 0xffff88031b54ea00, ops = 0xffffffff8130f650, name = "pts9\000\...", driver_data = 0xffff88029c8a9668, icanon = 1 '\001', read_buf = 0xffff8802cbfe6000 "", read_head = 0, read_tail = 0, read_cnt = 0, read_flags = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, canon_data = 0, ...................................... As per crash utility the field read_cnt is 0 when kernel oopsed.In that case, the statement while (nr && tty->read_cnt) in the above code fragment should have failed. This leads me to think that there was some other thread/task in kernel which should have updated the read_cnt field in parallel. However the crash utility reports that the runqueue of all CPUs at the time of crash as idle. Except CPU1 which was executing the user program telnet in kernel context ( system call ). Below is the runqueue output. CPU 0 RUNQUEUE: ffff880033012d80 CURRENT: PID: 0 TASK: ffffffff814204b0 COMMAND: "swapper" RT PRIO_ARRAY: ffff880033012e98 [no tasks queued] CFS RB_ROOT: ffff880033012e10 [no tasks queued] CPU 1 RUNQUEUE: ffff880033032d80 CURRENT: PID: 13366 TASK: ffff88031b60d580 COMMAND: "telnet" RT PRIO_ARRAY: ffff880033032e98 [no tasks queued] CFS RB_ROOT: ffff880033032e10 [no tasks queued] CPU 2 RUNQUEUE: ffff880033052d80 CURRENT: PID: 0 TASK: ffff88031e0e3540 COMMAND: "swapper" RT PRIO_ARRAY: ffff880033052e98 [no tasks queued] CFS RB_ROOT: ffff880033052e10 [no tasks queued] CPU 3 RUNQUEUE: ffff880033072d80 CURRENT: PID: 0 TASK: ffff88031e113580 COMMAND: "swapper" RT PRIO_ARRAY: ffff880033072e98 [no tasks queued] CFS RB_ROOT: ffff880033072e10 [no tasks queued] How is this logically possible. Crash reports there are no tasks running currently. Or before the oops trigger and kdump capturing the memory image, some process/thread ran which could have updated the data structure. I wanted to know if this scenario is possible. I kindly request your suggestion/guidance. Please let me know if you need any other details. Regards Shashidhara -----Original Message----- From: crash-utility-bounces@xxxxxxxxxx [mailto:crash-utility-bounces@xxxxxxxxxx] On Behalf Of Dave Anderson Sent: Tuesday, June 21, 2011 7:24 PM To: Discussion list for crash utility usage,maintenance and development Subject: Re: Unable to switch stack frames while using crash ----- Original Message ----- > Hi Dave, > > I updated the makedumpfile utility from 1.3.5 to 1.3.7 . When I run the > below command > > makedumpfile -c -d 31 -x vmlinux_temp vmcore vmcore-new > The kernel version is not supported. > The created dumpfile may be incomplete. > check_release: Can't get the kernel version. > makedumpfile Failed. I see that makedumpfile-1.3.8 was recently released, but it still has a LATEST_VERSION of 2.6.36: #define OLDEST_VERSION KERNEL_VERSION(2, 6, 15)/* linux-2.6.15 */ #define LATEST_VERSION KERNEL_VERSION(2, 6, 36)/* linux-2.6.36 */ You haven't stated what your kernel version is, but it seems makedumpfile cannot get past this point. On the other hand, the compressed kdump was created, so I'm not entirely clear. > Is there any other way to extract the ELF style vmcore file from the > kdump compressed format. Please guide me. I don't believe so... But I'm not the makedumpfile maintainer, so I'd prefer not to give any definitive answers to your questions. I've cc'd the upstream maintainer of makedumpfile. Thanks, Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility Information transmitted by this e-mail is proprietary to MphasiS, its associated companies and/ or its customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at mailmaster@xxxxxxxxxxx and delete this mail from your records. -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility