----- Original Message ----- > 2013/3/28 Dave Anderson <anderson@xxxxxxxxxx>: > > > > > > ----- Original Message ----- > >> 2013/3/27 Dave Anderson <anderson@xxxxxxxxxx>: > >> > > >> > > >> > ----- Original Message ----- > >> >> 2013/3/26 Dave Anderson <anderson@xxxxxxxxxx>: > >> >> > > >> >> > > >> >> > ----- Original Message ----- > >> >> >> Hi, list. > >> >> >> > >> >> >> I use crash-utility to analyse crash dump core from ARM soc. > >> >> >> When I > >> >> >> execute command below, I get the error "crash: read error: > >> >> >> kernel > >> >> >> virtual address: c0c1e040 type: "first vmap_area > >> >> >> va_start"". I also > >> >> >> test it by gdb. It works fine. The Linux kernel's version is > >> >> >> v3.0.8. > >> >> >> > >> >> >> hfli@pc1935:~/work/crash-utility$ ./crash vmlinux Vmcore > >> >> >> > >> >> >> crash 6.1.4 > >> >> >> Copyright (C) 2002-2013 Red Hat, Inc. > >> >> >> Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation > >> >> >> Copyright (C) 1999-2006 Hewlett-Packard Co > >> >> >> Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited > >> >> >> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. > >> >> >> Copyright (C) 2005, 2011 NEC Corporation > >> >> >> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. > >> >> >> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical > >> >> >> Linux, > >> >> >> Inc. > >> >> >> This program is free software, covered by the GNU General > >> >> >> Public License, > >> >> >> and you are welcome to change it and/or distribute copies of > >> >> >> it under > >> >> >> certain conditions. Enter "help copying" to see the > >> >> >> conditions. > >> >> >> This program has absolutely no warranty. Enter "help > >> >> >> warranty" for > >> >> >> details. > >> >> >> > >> >> >> GNU gdb (GDB) 7.3.1 > >> >> >> Copyright (C) 2011 Free Software Foundation, Inc. > >> >> >> License GPLv3+: GNU GPL version 3 or later > >> >> >> <http://gnu.org/licenses/gpl.html> > >> >> >> This is free software: you are free to change and > >> >> >> redistribute it. > >> >> >> There is NO WARRANTY, to the extent permitted by law. Type > >> >> >> "show copying" > >> >> >> and "show warranty" for details. > >> >> >> This GDB was configured as "--host=i686-pc-linux-gnu > >> >> >> --target=arm-elf-linux"... > >> >> >> > >> >> >> crash: read error: kernel virtual address: c0c1e040 type: > >> >> >> "first vmap_area va_start" > >> >> >> > >> >> >> Errors like the one above typically occur when the kernel > >> >> >> and memory source > >> >> >> do not match. These are the files being used: > >> >> >> > >> >> >> KERNEL: vmlinux > >> >> >> DUMPFILE: Vmcore > >> >> > > >> >> > You've answered your own question -- you should always see > >> >> > errors if the vmlinux > >> >> > kernel does not match the kernel crashed system. > >> >> > > >> >> > If you cannot find/access the original vmlinux file that was > >> >> > being run > >> >> > by the crashed kernel, then get the /boot/System.map file of > >> >> > the crashed > >> >> > kernel, and enter it on the command line: > >> >> Thanks for your reply. > >> >> > >> >> The vmlinux, include debug information, and crash kernel, is > >> >> cross-compile built and produced together. I couldn't > >> >> understand why > >> >> crash throw this warning "kernel and source doesn't match". > >> >> > >> >> > > >> >> > $ crash vmlinux Vmcore System.map > >> >> > > >> >> > The crash utility will replace all of the invalid symbol > >> >> > values from the > >> >> > "wrong" vmlinux file with their correct values from the > >> >> > System.map file. > >> >> > >> >> > >> >> A moment ago. I rebuilt the arm kernel source again. And took > >> >> "echo c > >> >> > /proc/sysrq-trigger" command to trigger system panic. The > >> >> > status lists below. > >> >> hfli@pc1935:~/work/crash-utility$ ./crash vmlinux0327 > >> >> Vmcore0327 > >> >> > >> >> crash 6.1.4 > >> >> Copyright (C) 2002-2013 Red Hat, Inc. > >> >> Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation > >> >> Copyright (C) 1999-2006 Hewlett-Packard Co > >> >> Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited > >> >> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. > >> >> Copyright (C) 2005, 2011 NEC Corporation > >> >> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. > >> >> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, > >> >> Inc. > >> >> This program is free software, covered by the GNU General > >> >> Public License, > >> >> and you are welcome to change it and/or distribute copies of it > >> >> under > >> >> certain conditions. Enter "help copying" to see the > >> >> conditions. > >> >> This program has absolutely no warranty. Enter "help warranty" > >> >> for > >> >> details. > >> >> > >> >> GNU gdb (GDB) 7.3.1 > >> >> Copyright (C) 2011 Free Software Foundation, Inc. > >> >> License GPLv3+: GNU GPL version 3 or later > >> >> <http://gnu.org/licenses/gpl.html> > >> >> This is free software: you are free to change and redistribute > >> >> it. > >> >> There is NO WARRANTY, to the extent permitted by law. Type > >> >> "show copying" > >> >> and "show warranty" for details. > >> >> This GDB was configured as "--host=i686-pc-linux-gnu > >> >> --target=arm-elf-linux"... > >> >> > >> >> please wait... (gathering kmem slab cache data) > >> >> crash: read error: kernel virtual address: c0c91840 type: > >> >> "kmem_cache buffer" > >> >> > >> >> crash: unable to initialize kmem slab cache subsystem > >> >> > >> >> > >> >> WARNING: invalid note (n_type != NT_PRSTATUS) > >> >> > >> >> WARNING: could not retrieve crash_notes > >> >> please wait... (gathering task table data) > >> >> crash: cannot read pid_hash upid > >> >> > >> >> crash: cannot read pid_hash upid > >> >> please wait... (determining panic task) > >> >> WARNING: cannot get stackframe for task > >> >> KERNEL: vmlinux0327 > >> >> DUMPFILE: Vmcore0327 > >> >> CPUS: 1 > >> >> DATE: Thu Jan 1 08:00:00 1970 > >> >> UPTIME: 00:00:00 > >> >> LOAD AVERAGE: 0.00, 0.00, 0.00 > >> >> TASKS: 1 > >> >> NODENAME: 10.38.50.241 > >> >> RELEASE: 3.0.8-00010-gb7f16a3-dirty > >> >> VERSION: #339 Wed Mar 27 10:39:43 CST 2013 > >> >> MACHINE: armv7l (unknown Mhz) > >> >> MEMORY: 19 MB > >> >> PANIC: "" > >> >> PID: 0 > >> >> COMMAND: "swapper" > >> >> TASK: c02e0620 [THREAD_INFO: c02dc000] > >> >> CPU: 0 > >> >> STATE: TASK_RUNNING (ACTIVE) > >> >> WARNING: panic task not found > >> >> > >> >> crash> > >> >> > >> >> > >> >> It also didn't works so fine. Then I appended system.map, the > >> >> output > >> >> result is also the same. > >> > > >> > OK, so then it's not clear to me why you're seeing those errors. > >> > > >> > Was the dumpfile created using kdump? It almost looks like the > >> > dump > >> > was taken while the system was still running? Have you *ever* > >> > created > >> > a dumpfile that resulted in an error-free crash session? > >> > >> Yes, the dumpfile is created by kdump. The dump was taken by "echo > >> c > > >> /proc/sysrq-trigger". > >> > >> I will try another case by inserting a panic module tomorrow. > >> > > >> > Perhaps the ARM users on this list have seen this kind of thing? > >> > > >> > If you enter "crash -d8 ..." on the command line, you may get a > >> > better > >> > picture of what leads up to the errors shown above, and of most > >> > interest, the readmem() calls that generate the errors. If you > >> > see a "crash: read error: ...", then that means that the > >> > dumpfile > >> > doesn't contain the physical page associated with the virtual > >> > address shown. But it's not clear whether the address itself > >> > is legitimate, i.e., was it gathered from the wrong location. > >> > >> Sounds reasonable. > >> > >> > > >> >> > >> >> I try GDB to test it. > >> >> hfli@pc1935:~/work/crash-utility$ ./gdb-7.5/gdb/gdb vmlinux0327 > >> >> Vmcore0327 > >> >> GNU gdb (GDB) 7.5 > >> >> Copyright (C) 2012 Free Software Foundation, Inc. > >> >> License GPLv3+: GNU GPL version 3 or later > >> >> <http://gnu.org/licenses/gpl.html> > >> >> This is free software: you are free to change and redistribute > >> >> it. > >> >> There is NO WARRANTY, to the extent permitted by law. Type > >> >> "show copying" > >> >> and "show warranty" for details. > >> >> This GDB was configured as "--host=x86 > >> >> --target=arm-linux-gnueabi". > >> >> For bug reporting instructions, please see: > >> >> <http://www.gnu.org/software/gdb/bugs/>... > >> >> Reading symbols from > >> >> /home/hfli/work/crash-utility/vmlinux0327...done. > >> >> > >> >> warning: exec file is newer than core file. > >> > > >> > Again, this bothers me -- why is it "newer" than the core file? > >> > Are you sure that they are *exactly* the same? > >> > >> I am sure they are *exactly* the same. :-) > >> > >> I'm not clear the internals of how to judge exec file and core > >> file. > > > > gdb is warning that it appears that you must have compiled the > > vmlinux0327 > > after the Vmcore0327 dumpfile was created? Perhaps it's because > > you copied > > the two files to the host system where you're running gdb from in > > the > > "wrong" order. > > > > What I was trying to confirm is that when you rebuilt the vmlinux > > file > > with debuginfo data, that you also *installed* that rebuilt kernel > > onto > > the target system prior to crashing it. > > > >> > >> > > >> >> [New LWP 278] > >> >> #0 0xc0155f7c in sysrq_handle_crash (key=99) at > >> >> drivers/tty/sysrq.c:134 > >> >> 134 *killer = 1; > >> >> (gdb) list > >> >> 129 { > >> >> 130 char *killer = NULL; > >> >> 131 > >> >> 132 panic_on_oops = 1; /* force panic */ > >> >> 133 wmb(); > >> >> 134 *killer = 1; > >> >> 135 } > >> >> 136 static struct sysrq_key_op sysrq_crash_op = { > >> >> 137 .handler = sysrq_handle_crash, > >> >> 138 .help_msg = "Crash", > >> >> (gdb) > >> >> > >> >> gdb also works fine. > >> >> > >> > > >> > It works fine for gdb in the very limited case above. The crash > >> > utility > >> > is also "working fine" for a much more expansive access of the > >> > dumpfile. > >> > But if you tried to access the same locations in the dumpfile > >> > that the > >> > crash utility is doing during its initialization, then gdb would > >> > also > >> > fail. > >> > > >> > Let's take a simple example -- in your first email, you saw this > >> > error: > >> > > >> > crash: read error: kernel virtual address: c0c1e040 type: > >> > "first > >> > vmap_area va_start" > >> > > >> > which came from here: > >> > > >> > if (vt->flags & USE_VMAP_AREA) { > >> > get_symbol_data("vmap_area_list", sizeof(void > >> > *), > >> > &vmap_area); > >> > if (!vmap_area) > >> > return 0; > >> > if (!readmem(vmap_area - OFFSET(vmap_area_list) > >> > + > >> > OFFSET(vmap_area_va_start), KVADDR, > >> > &vmalloc_start, > >> > sizeof(void *), "first vmap_area va_start", > >> > RETURN_ON_ERROR)) > >> > non_matching_kernel(); > >> > > >> > If I look at a sample ARM dumpfile I have, I see this: > >> > > >> > crash> p vmap_area_list > >> > vmap_area_list = $8 = { > >> > next = 0xc30d4d78, > >> > prev = 0xc06702b8 > >> > } > >> > > >> > where the "next" pointer of 0xc30d4d78 above points to the > >> > "list" member > >> > of a vmap_area structure: > >> > > >> > crash> struct vmap_area > >> > struct vmap_area { > >> > long unsigned int va_start; > >> > long unsigned int va_end; > >> > long unsigned int flags; > >> > struct rb_node rb_node; > >> > struct list_head list; <== "next" points here > >> > struct list_head purge_list; > >> > void *private; > >> > struct rcu_head rcu_head; > >> > } > >> > SIZE: 52 > >> > crash> > >> > > >> > And I can dump that vmap_area structure like this: > >> > > >> > crash> struct -x vmap_area -l vmap_area.list 0xc30d4d78 > >> > struct vmap_area { > >> > va_start = 0xbf000000, > >> > va_end = 0xbf005000, > >> > flags = 0x4, > >> > rb_node = { > >> > rb_parent_color = 0xc2ca076d, > >> > rb_right = 0x0, > >> > rb_left = 0x0 > >> > }, > >> > list = { > >> > next = 0xc2ca0778, > >> > prev = 0xc0411ed4 > >> > }, > >> > purge_list = { > >> > next = 0x0, > >> > prev = 0x0 > >> > }, > >> > private = 0xc3396860, > >> > rcu_head = { > >> > next = 0x0, > >> > func = 0 > >> > } > >> > } > >> > > >> > But your kernel found a "vmap_area_list.next" pointer of > >> > c0c1e040, > >> > but it was not accessible from the dumpfile. > >> > > >> > So either: > >> > > >> > (1) the "vmap_area_list" symbol value was not correct, or > >> > (2) the page containing the first vmap_area structure was > >> > not included in the dumpfile. > >> > > >> > Problem (1) can happen if your crashed kernel doesn't match the > >> > vmlinux file, i.e., the symbol values don't match. But if the > >> > "vmap_area_list" symbol was correct, then (2) mush have > >> > occurred, > >> > and that should never happen unless the dumpfile was corrupted > >> > or > >> > was created incorrectly. > >> > > >> > >> Agree. > >> > >> Thanks for your patience again. > >> > >> For my case, the crashkernel cmdline of crash kernel is > >> crashkernel=20M@10M. When the capture kernel launch, the > >> elfcorehdr=0x1d00000, and the initialization of /proc/vmcore will > >> fail > >> with WARN_ON(pfn_valid(pfn)) throwing. > >> > >> The routine is > >> vmcore_init->parse_crash_elf_headers->read_from_oldmem->copy_oldmem_page->ioremap->__arm_ioremap->arch_ioremap_caller->__arm_ioremap_caller->__arm_ioremap_pfn_caller->WARN_ON(pfn_valid(pfn)). > >> > >> My temporary solution is comment the WARN_ON() to make > >> /proc/vmcore work. > >> > >> May my comment method corrupt the vmcore? > > > > Does the crash session come up cleanly? > > > > I don't know about the arm_ioremap issue -- that's for the ARM guys > > to answer. > > > > I'm not familiar with the specifics on how the kernel's vmcore > > creation works, > > but do you see differences in the contents of the PT_LOAD segments > > after applying > > your temporary solution? In other words, if you do this with an > > old vmcore > > vs. a new vmcore: > > > > $ readelf -a vmcore > > ELF Header: > > Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 > > Class: ELF32 > > Data: 2's complement, little endian > > Version: 1 (current) > > OS/ABI: UNIX - System V > > ABI Version: 0 > > Type: CORE (Core file) > > Machine: ARM > > Version: 0x1 > > Entry point address: 0x0 > > Start of program headers: 52 (bytes into file) > > Start of section headers: 0 (bytes into file) > > Flags: 0x0 > > Size of this header: 52 (bytes) > > Size of program headers: 32 (bytes) > > Number of program headers: 3 > > Size of section headers: 0 (bytes) > > Number of section headers: 0 > > Section header string table index: 0 > > > > There are no sections in this file. > > > > There are no sections to group in this file. > > > > Program Headers: > > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg > > Align > > NOTE 0x000094 0x00000000 0x00000000 0x00514 0x00514 > > 0 > > LOAD 0x0005a8 0xc0000000 0xc0000000 0x2000000 0x2000000 > > RWE 0 > > LOAD 0x20005a8 0xc2800000 0xc2800000 0x1800000 > > 0x1800000 RWE 0 > > > > There is no dynamic section in this file. > > > > There are no relocations in this file. > > > > No version information found in this file. > > > > Notes at offset 0x00000094 with length 0x00000514: > > Owner Data size Description > > CORE 0x00000094 NT_PRSTATUS (prstatus > > structure) > > VMCOREINFO 0x00000452 Unknown note type: > > (0x00000000) > > $ > > > > Are the LOAD sections different? > > hfli@msh-pc1935:~/work/crash-utility$ readelf -a Vmcore308 > ELF Header: > Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 > Class: ELF32 > Data: 2's complement, little endian > Version: 1 (current) > OS/ABI: UNIX - System V > ABI Version: 0 > Type: CORE (Core file) > Machine: ARM > Version: 0x1 > Entry point address: 0x0 > Start of program headers: 52 (bytes into file) > Start of section headers: 0 (bytes into file) > Flags: 0x0 > Size of this header: 52 (bytes) > Size of program headers: 32 (bytes) > Number of program headers: 3 > Size of section headers: 0 (bytes) > Number of section headers: 0 > Section header string table index: 0 > > There are no sections in this file. > > There are no sections to group in this file. > > Program Headers: > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg > Align > NOTE 0x000094 0x00000000 0x00000000 0x000a8 0x000a8 0 > LOAD 0x00013c 0xc0000000 0x00000000 0xa00000 0xa00000 RWE 0 > LOAD 0xa0013c 0xc1e00000 0x01e00000 0x6200000 0x6200000 RWE 0 > > There is no dynamic section in this file. > > There are no relocations in this file. > > No version information found in this file. > > Notes at offset 0x00000094 with length 0x000000a8: > Owner Data size Description > CORE 0x00000094 NT_PRSTATUS (prstatus > structure) > > --- > I notice Notes section has not _VMCOREINFO_. > > The following is my step of using kdump and crash utility. > > 1. built linux kernel source > 2. Put arch/arm/boot/uImage to tftp server; > Put arch/arm/boot/uImage to nfs server.(kernel launch rootfs by > NFS) > 3. bootup uImage with "crashkernel=20M@10M" > 4. load uImage of capture kernel。 > $./sbin/kexec -p --atags --append="console=ttyAM0,38400n8 > root=/dev/nfs rw nfsroot=10.38.50.248:/nfs/nfs ip=10.38.50.241 > loglevel=15 rdinit=/rdinit" /uImagetahoe308 > 5 inserting panic module to trigger panic. > $insmod module.ko > 6 capture kernel boots up. (In the progress of booting, capture will > initialize /proc/vmcore. if the initialization of vmcore fails, > /proc/vmcore won't existence.) > 7. use _cp_ tool dump the vmcore > $cp /proc/vmcore /Vmcore308 > 8. copy vmlinux & Vmcore308 to crash working directory and use crash > utility analyse the Vmcore 308. > > hfli@pc1935:~/work/crash-utility$ ./crash vmlinux308 Vmcore308 > > crash 6.1.4 > Copyright (C) 2002-2013 Red Hat, Inc. > Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation > Copyright (C) 1999-2006 Hewlett-Packard Co > Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. > Copyright (C) 2005, 2011 NEC Corporation > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. > This program is free software, covered by the GNU General Public > License, > and you are welcome to change it and/or distribute copies of it under > certain conditions. Enter "help copying" to see the conditions. > This program has absolutely no warranty. Enter "help warranty" for > details. > > GNU gdb (GDB) 7.3.1 > Copyright (C) 2011 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show > copying" > and "show warranty" for details. > This GDB was configured as "--host=i686-pc-linux-gnu > --target=arm-elf-linux"... > > crash: read error: kernel virtual address: c0c1e040 type: "first vmap_area va_start" > > Errors like the one above typically occur when the kernel and memory > source > do not match. These are the files being used: > > KERNEL: vmlinux308 > DUMPFILE: Vmcore308 > > -- > Unfortunately, the crash also read error and deduce the kernel and > memory source don't match. > > The vmcore initialization looks like fine. and copying the dump file > of /proc/vmcore also works fine. > > I couldn't know whether and why the vmcore is corrupt. I don't know either, but in the case above, kernel virtual address c0c1e040 doesn't fit in the virtual address ranges declared in the vmcore header: > Program Headers: > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg > Align > NOTE 0x000094 0x00000000 0x00000000 0x000a8 0x000a8 0 > LOAD 0x00013c 0xc0000000 0x00000000 0xa00000 0xa00000 RWE 0 > LOAD 0xa0013c 0xc1e00000 0x01e00000 0x6200000 0x6200000 RWE 0 If you go through the exercise I showed a few messages back, i.e, look at the kernel's vmap_area_list list_head structure by entering "p vmap_area_list", you should see its "next" pointer containing the c0c1e040 address. But the vmcore shows a hole between c0a00000 and c1e00000. Dave > > > Thanks. > > > > Anyway, if the crash session comes up cleanly when you apply your > > temporary > > solution, then clearly you've identified the problem at hand. > > > > Dave > > > > > > -- > > Crash-utility mailing list > > Crash-utility@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/crash-utility > > -- > Crash-utility mailing list > Crash-utility@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/crash-utility -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility