2013/3/30 Dave Anderson <anderson@xxxxxxxxxx>: > > > ----- Original Message ----- >> 2013/3/28 Dave Anderson <anderson@xxxxxxxxxx>: >> > >> > >> > ----- Original Message ----- >> >> 2013/3/27 Dave Anderson <anderson@xxxxxxxxxx>: >> >> > >> >> > >> >> > ----- Original Message ----- >> >> >> 2013/3/26 Dave Anderson <anderson@xxxxxxxxxx>: >> >> >> > >> >> >> > >> >> >> > ----- Original Message ----- >> >> >> >> Hi, list. >> >> >> >> >> >> >> >> I use crash-utility to analyse crash dump core from ARM soc. >> >> >> >> When I >> >> >> >> execute command below, I get the error "crash: read error: >> >> >> >> kernel >> >> >> >> virtual address: c0c1e040 type: "first vmap_area >> >> >> >> va_start"". I also >> >> >> >> test it by gdb. It works fine. The Linux kernel's version is >> >> >> >> v3.0.8. >> >> >> >> >> >> >> >> hfli@pc1935:~/work/crash-utility$ ./crash vmlinux Vmcore >> >> >> >> >> >> >> >> crash 6.1.4 >> >> >> >> Copyright (C) 2002-2013 Red Hat, Inc. >> >> >> >> Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation >> >> >> >> Copyright (C) 1999-2006 Hewlett-Packard Co >> >> >> >> Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited >> >> >> >> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. >> >> >> >> Copyright (C) 2005, 2011 NEC Corporation >> >> >> >> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. >> >> >> >> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical >> >> >> >> Linux, >> >> >> >> Inc. >> >> >> >> This program is free software, covered by the GNU General >> >> >> >> Public License, >> >> >> >> and you are welcome to change it and/or distribute copies of >> >> >> >> it under >> >> >> >> certain conditions. Enter "help copying" to see the >> >> >> >> conditions. >> >> >> >> This program has absolutely no warranty. Enter "help >> >> >> >> warranty" for >> >> >> >> details. >> >> >> >> >> >> >> >> GNU gdb (GDB) 7.3.1 >> >> >> >> Copyright (C) 2011 Free Software Foundation, Inc. >> >> >> >> License GPLv3+: GNU GPL version 3 or later >> >> >> >> <http://gnu.org/licenses/gpl.html> >> >> >> >> This is free software: you are free to change and >> >> >> >> redistribute it. >> >> >> >> There is NO WARRANTY, to the extent permitted by law. Type >> >> >> >> "show copying" >> >> >> >> and "show warranty" for details. >> >> >> >> This GDB was configured as "--host=i686-pc-linux-gnu >> >> >> >> --target=arm-elf-linux"... >> >> >> >> >> >> >> >> crash: read error: kernel virtual address: c0c1e040 type: >> >> >> >> "first vmap_area va_start" >> >> >> >> >> >> >> >> Errors like the one above typically occur when the kernel >> >> >> >> and memory source >> >> >> >> do not match. These are the files being used: >> >> >> >> >> >> >> >> KERNEL: vmlinux >> >> >> >> DUMPFILE: Vmcore >> >> >> > >> >> >> > You've answered your own question -- you should always see >> >> >> > errors if the vmlinux >> >> >> > kernel does not match the kernel crashed system. >> >> >> > >> >> >> > If you cannot find/access the original vmlinux file that was >> >> >> > being run >> >> >> > by the crashed kernel, then get the /boot/System.map file of >> >> >> > the crashed >> >> >> > kernel, and enter it on the command line: >> >> >> Thanks for your reply. >> >> >> >> >> >> The vmlinux, include debug information, and crash kernel, is >> >> >> cross-compile built and produced together. I couldn't >> >> >> understand why >> >> >> crash throw this warning "kernel and source doesn't match". >> >> >> >> >> >> > >> >> >> > $ crash vmlinux Vmcore System.map >> >> >> > >> >> >> > The crash utility will replace all of the invalid symbol >> >> >> > values from the >> >> >> > "wrong" vmlinux file with their correct values from the >> >> >> > System.map file. >> >> >> >> >> >> >> >> >> A moment ago. I rebuilt the arm kernel source again. And took >> >> >> "echo c >> >> >> > /proc/sysrq-trigger" command to trigger system panic. The >> >> >> > status lists below. >> >> >> hfli@pc1935:~/work/crash-utility$ ./crash vmlinux0327 >> >> >> Vmcore0327 >> >> >> >> >> >> crash 6.1.4 >> >> >> Copyright (C) 2002-2013 Red Hat, Inc. >> >> >> Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation >> >> >> Copyright (C) 1999-2006 Hewlett-Packard Co >> >> >> Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited >> >> >> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. >> >> >> Copyright (C) 2005, 2011 NEC Corporation >> >> >> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. >> >> >> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, >> >> >> Inc. >> >> >> This program is free software, covered by the GNU General >> >> >> Public License, >> >> >> and you are welcome to change it and/or distribute copies of it >> >> >> under >> >> >> certain conditions. Enter "help copying" to see the >> >> >> conditions. >> >> >> This program has absolutely no warranty. Enter "help warranty" >> >> >> for >> >> >> details. >> >> >> >> >> >> GNU gdb (GDB) 7.3.1 >> >> >> Copyright (C) 2011 Free Software Foundation, Inc. >> >> >> License GPLv3+: GNU GPL version 3 or later >> >> >> <http://gnu.org/licenses/gpl.html> >> >> >> This is free software: you are free to change and redistribute >> >> >> it. >> >> >> There is NO WARRANTY, to the extent permitted by law. Type >> >> >> "show copying" >> >> >> and "show warranty" for details. >> >> >> This GDB was configured as "--host=i686-pc-linux-gnu >> >> >> --target=arm-elf-linux"... >> >> >> >> >> >> please wait... (gathering kmem slab cache data) >> >> >> crash: read error: kernel virtual address: c0c91840 type: >> >> >> "kmem_cache buffer" >> >> >> >> >> >> crash: unable to initialize kmem slab cache subsystem >> >> >> >> >> >> >> >> >> WARNING: invalid note (n_type != NT_PRSTATUS) >> >> >> >> >> >> WARNING: could not retrieve crash_notes >> >> >> please wait... (gathering task table data) >> >> >> crash: cannot read pid_hash upid >> >> >> >> >> >> crash: cannot read pid_hash upid >> >> >> please wait... (determining panic task) >> >> >> WARNING: cannot get stackframe for task >> >> >> KERNEL: vmlinux0327 >> >> >> DUMPFILE: Vmcore0327 >> >> >> CPUS: 1 >> >> >> DATE: Thu Jan 1 08:00:00 1970 >> >> >> UPTIME: 00:00:00 >> >> >> LOAD AVERAGE: 0.00, 0.00, 0.00 >> >> >> TASKS: 1 >> >> >> NODENAME: 10.38.50.241 >> >> >> RELEASE: 3.0.8-00010-gb7f16a3-dirty >> >> >> VERSION: #339 Wed Mar 27 10:39:43 CST 2013 >> >> >> MACHINE: armv7l (unknown Mhz) >> >> >> MEMORY: 19 MB >> >> >> PANIC: "" >> >> >> PID: 0 >> >> >> COMMAND: "swapper" >> >> >> TASK: c02e0620 [THREAD_INFO: c02dc000] >> >> >> CPU: 0 >> >> >> STATE: TASK_RUNNING (ACTIVE) >> >> >> WARNING: panic task not found >> >> >> >> >> >> crash> >> >> >> >> >> >> >> >> >> It also didn't works so fine. Then I appended system.map, the >> >> >> output >> >> >> result is also the same. >> >> > >> >> > OK, so then it's not clear to me why you're seeing those errors. >> >> > >> >> > Was the dumpfile created using kdump? It almost looks like the >> >> > dump >> >> > was taken while the system was still running? Have you *ever* >> >> > created >> >> > a dumpfile that resulted in an error-free crash session? >> >> >> >> Yes, the dumpfile is created by kdump. The dump was taken by "echo >> >> c > >> >> /proc/sysrq-trigger". >> >> >> >> I will try another case by inserting a panic module tomorrow. >> >> > >> >> > Perhaps the ARM users on this list have seen this kind of thing? >> >> > >> >> > If you enter "crash -d8 ..." on the command line, you may get a >> >> > better >> >> > picture of what leads up to the errors shown above, and of most >> >> > interest, the readmem() calls that generate the errors. If you >> >> > see a "crash: read error: ...", then that means that the >> >> > dumpfile >> >> > doesn't contain the physical page associated with the virtual >> >> > address shown. But it's not clear whether the address itself >> >> > is legitimate, i.e., was it gathered from the wrong location. >> >> >> >> Sounds reasonable. >> >> >> >> > >> >> >> >> >> >> I try GDB to test it. >> >> >> hfli@pc1935:~/work/crash-utility$ ./gdb-7.5/gdb/gdb vmlinux0327 >> >> >> Vmcore0327 >> >> >> GNU gdb (GDB) 7.5 >> >> >> Copyright (C) 2012 Free Software Foundation, Inc. >> >> >> License GPLv3+: GNU GPL version 3 or later >> >> >> <http://gnu.org/licenses/gpl.html> >> >> >> This is free software: you are free to change and redistribute >> >> >> it. >> >> >> There is NO WARRANTY, to the extent permitted by law. Type >> >> >> "show copying" >> >> >> and "show warranty" for details. >> >> >> This GDB was configured as "--host=x86 >> >> >> --target=arm-linux-gnueabi". >> >> >> For bug reporting instructions, please see: >> >> >> <http://www.gnu.org/software/gdb/bugs/>... >> >> >> Reading symbols from >> >> >> /home/hfli/work/crash-utility/vmlinux0327...done. >> >> >> >> >> >> warning: exec file is newer than core file. >> >> > >> >> > Again, this bothers me -- why is it "newer" than the core file? >> >> > Are you sure that they are *exactly* the same? >> >> >> >> I am sure they are *exactly* the same. :-) >> >> >> >> I'm not clear the internals of how to judge exec file and core >> >> file. >> > >> > gdb is warning that it appears that you must have compiled the >> > vmlinux0327 >> > after the Vmcore0327 dumpfile was created? Perhaps it's because >> > you copied >> > the two files to the host system where you're running gdb from in >> > the >> > "wrong" order. >> > >> > What I was trying to confirm is that when you rebuilt the vmlinux >> > file >> > with debuginfo data, that you also *installed* that rebuilt kernel >> > onto >> > the target system prior to crashing it. >> > >> >> >> >> > >> >> >> [New LWP 278] >> >> >> #0 0xc0155f7c in sysrq_handle_crash (key=99) at >> >> >> drivers/tty/sysrq.c:134 >> >> >> 134 *killer = 1; >> >> >> (gdb) list >> >> >> 129 { >> >> >> 130 char *killer = NULL; >> >> >> 131 >> >> >> 132 panic_on_oops = 1; /* force panic */ >> >> >> 133 wmb(); >> >> >> 134 *killer = 1; >> >> >> 135 } >> >> >> 136 static struct sysrq_key_op sysrq_crash_op = { >> >> >> 137 .handler = sysrq_handle_crash, >> >> >> 138 .help_msg = "Crash", >> >> >> (gdb) >> >> >> >> >> >> gdb also works fine. >> >> >> >> >> > >> >> > It works fine for gdb in the very limited case above. The crash >> >> > utility >> >> > is also "working fine" for a much more expansive access of the >> >> > dumpfile. >> >> > But if you tried to access the same locations in the dumpfile >> >> > that the >> >> > crash utility is doing during its initialization, then gdb would >> >> > also >> >> > fail. >> >> > >> >> > Let's take a simple example -- in your first email, you saw this >> >> > error: >> >> > >> >> > crash: read error: kernel virtual address: c0c1e040 type: >> >> > "first >> >> > vmap_area va_start" >> >> > >> >> > which came from here: >> >> > >> >> > if (vt->flags & USE_VMAP_AREA) { >> >> > get_symbol_data("vmap_area_list", sizeof(void >> >> > *), >> >> > &vmap_area); >> >> > if (!vmap_area) >> >> > return 0; >> >> > if (!readmem(vmap_area - OFFSET(vmap_area_list) >> >> > + >> >> > OFFSET(vmap_area_va_start), KVADDR, >> >> > &vmalloc_start, >> >> > sizeof(void *), "first vmap_area va_start", >> >> > RETURN_ON_ERROR)) >> >> > non_matching_kernel(); >> >> > >> >> > If I look at a sample ARM dumpfile I have, I see this: >> >> > >> >> > crash> p vmap_area_list >> >> > vmap_area_list = $8 = { >> >> > next = 0xc30d4d78, >> >> > prev = 0xc06702b8 >> >> > } >> >> > >> >> > where the "next" pointer of 0xc30d4d78 above points to the >> >> > "list" member >> >> > of a vmap_area structure: >> >> > >> >> > crash> struct vmap_area >> >> > struct vmap_area { >> >> > long unsigned int va_start; >> >> > long unsigned int va_end; >> >> > long unsigned int flags; >> >> > struct rb_node rb_node; >> >> > struct list_head list; <== "next" points here >> >> > struct list_head purge_list; >> >> > void *private; >> >> > struct rcu_head rcu_head; >> >> > } >> >> > SIZE: 52 >> >> > crash> >> >> > >> >> > And I can dump that vmap_area structure like this: >> >> > >> >> > crash> struct -x vmap_area -l vmap_area.list 0xc30d4d78 >> >> > struct vmap_area { >> >> > va_start = 0xbf000000, >> >> > va_end = 0xbf005000, >> >> > flags = 0x4, >> >> > rb_node = { >> >> > rb_parent_color = 0xc2ca076d, >> >> > rb_right = 0x0, >> >> > rb_left = 0x0 >> >> > }, >> >> > list = { >> >> > next = 0xc2ca0778, >> >> > prev = 0xc0411ed4 >> >> > }, >> >> > purge_list = { >> >> > next = 0x0, >> >> > prev = 0x0 >> >> > }, >> >> > private = 0xc3396860, >> >> > rcu_head = { >> >> > next = 0x0, >> >> > func = 0 >> >> > } >> >> > } >> >> > >> >> > But your kernel found a "vmap_area_list.next" pointer of >> >> > c0c1e040, >> >> > but it was not accessible from the dumpfile. >> >> > >> >> > So either: >> >> > >> >> > (1) the "vmap_area_list" symbol value was not correct, or >> >> > (2) the page containing the first vmap_area structure was >> >> > not included in the dumpfile. >> >> > >> >> > Problem (1) can happen if your crashed kernel doesn't match the >> >> > vmlinux file, i.e., the symbol values don't match. But if the >> >> > "vmap_area_list" symbol was correct, then (2) mush have >> >> > occurred, >> >> > and that should never happen unless the dumpfile was corrupted >> >> > or >> >> > was created incorrectly. >> >> > >> >> >> >> Agree. >> >> >> >> Thanks for your patience again. >> >> >> >> For my case, the crashkernel cmdline of crash kernel is >> >> crashkernel=20M@10M. When the capture kernel launch, the >> >> elfcorehdr=0x1d00000, and the initialization of /proc/vmcore will >> >> fail >> >> with WARN_ON(pfn_valid(pfn)) throwing. >> >> >> >> The routine is >> >> vmcore_init->parse_crash_elf_headers->read_from_oldmem->copy_oldmem_page->ioremap->__arm_ioremap->arch_ioremap_caller->__arm_ioremap_caller->__arm_ioremap_pfn_caller->WARN_ON(pfn_valid(pfn)). >> >> >> >> My temporary solution is comment the WARN_ON() to make >> >> /proc/vmcore work. >> >> >> >> May my comment method corrupt the vmcore? >> > >> > Does the crash session come up cleanly? >> > >> > I don't know about the arm_ioremap issue -- that's for the ARM guys >> > to answer. >> > >> > I'm not familiar with the specifics on how the kernel's vmcore >> > creation works, >> > but do you see differences in the contents of the PT_LOAD segments >> > after applying >> > your temporary solution? In other words, if you do this with an >> > old vmcore >> > vs. a new vmcore: >> > >> > $ readelf -a vmcore >> > ELF Header: >> > Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 >> > Class: ELF32 >> > Data: 2's complement, little endian >> > Version: 1 (current) >> > OS/ABI: UNIX - System V >> > ABI Version: 0 >> > Type: CORE (Core file) >> > Machine: ARM >> > Version: 0x1 >> > Entry point address: 0x0 >> > Start of program headers: 52 (bytes into file) >> > Start of section headers: 0 (bytes into file) >> > Flags: 0x0 >> > Size of this header: 52 (bytes) >> > Size of program headers: 32 (bytes) >> > Number of program headers: 3 >> > Size of section headers: 0 (bytes) >> > Number of section headers: 0 >> > Section header string table index: 0 >> > >> > There are no sections in this file. >> > >> > There are no sections to group in this file. >> > >> > Program Headers: >> > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg >> > Align >> > NOTE 0x000094 0x00000000 0x00000000 0x00514 0x00514 >> > 0 >> > LOAD 0x0005a8 0xc0000000 0xc0000000 0x2000000 0x2000000 >> > RWE 0 >> > LOAD 0x20005a8 0xc2800000 0xc2800000 0x1800000 >> > 0x1800000 RWE 0 >> > >> > There is no dynamic section in this file. >> > >> > There are no relocations in this file. >> > >> > No version information found in this file. >> > >> > Notes at offset 0x00000094 with length 0x00000514: >> > Owner Data size Description >> > CORE 0x00000094 NT_PRSTATUS (prstatus >> > structure) >> > VMCOREINFO 0x00000452 Unknown note type: >> > (0x00000000) >> > $ >> > >> > Are the LOAD sections different? >> >> hfli@msh-pc1935:~/work/crash-utility$ readelf -a Vmcore308 >> ELF Header: >> Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 >> Class: ELF32 >> Data: 2's complement, little endian >> Version: 1 (current) >> OS/ABI: UNIX - System V >> ABI Version: 0 >> Type: CORE (Core file) >> Machine: ARM >> Version: 0x1 >> Entry point address: 0x0 >> Start of program headers: 52 (bytes into file) >> Start of section headers: 0 (bytes into file) >> Flags: 0x0 >> Size of this header: 52 (bytes) >> Size of program headers: 32 (bytes) >> Number of program headers: 3 >> Size of section headers: 0 (bytes) >> Number of section headers: 0 >> Section header string table index: 0 >> >> There are no sections in this file. >> >> There are no sections to group in this file. >> >> Program Headers: >> Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg >> Align >> NOTE 0x000094 0x00000000 0x00000000 0x000a8 0x000a8 0 >> LOAD 0x00013c 0xc0000000 0x00000000 0xa00000 0xa00000 RWE 0 >> LOAD 0xa0013c 0xc1e00000 0x01e00000 0x6200000 0x6200000 RWE 0 >> >> There is no dynamic section in this file. >> >> There are no relocations in this file. >> >> No version information found in this file. >> >> Notes at offset 0x00000094 with length 0x000000a8: >> Owner Data size Description >> CORE 0x00000094 NT_PRSTATUS (prstatus >> structure) >> >> --- >> I notice Notes section has not _VMCOREINFO_. >> >> The following is my step of using kdump and crash utility. >> >> 1. built linux kernel source >> 2. Put arch/arm/boot/uImage to tftp server; >> Put arch/arm/boot/uImage to nfs server.(kernel launch rootfs by >> NFS) >> 3. bootup uImage with "crashkernel=20M@10M" >> 4. load uImage of capture kernel。 >> $./sbin/kexec -p --atags --append="console=ttyAM0,38400n8 >> root=/dev/nfs rw nfsroot=10.38.50.248:/nfs/nfs ip=10.38.50.241 >> loglevel=15 rdinit=/rdinit" /uImagetahoe308 >> 5 inserting panic module to trigger panic. >> $insmod module.ko >> 6 capture kernel boots up. (In the progress of booting, capture will >> initialize /proc/vmcore. if the initialization of vmcore fails, >> /proc/vmcore won't existence.) >> 7. use _cp_ tool dump the vmcore >> $cp /proc/vmcore /Vmcore308 >> 8. copy vmlinux & Vmcore308 to crash working directory and use crash >> utility analyse the Vmcore 308. >> >> hfli@pc1935:~/work/crash-utility$ ./crash vmlinux308 Vmcore308 >> >> crash 6.1.4 >> Copyright (C) 2002-2013 Red Hat, Inc. >> Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation >> Copyright (C) 1999-2006 Hewlett-Packard Co >> Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited >> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. >> Copyright (C) 2005, 2011 NEC Corporation >> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. >> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. >> This program is free software, covered by the GNU General Public >> License, >> and you are welcome to change it and/or distribute copies of it under >> certain conditions. Enter "help copying" to see the conditions. >> This program has absolutely no warranty. Enter "help warranty" for >> details. >> >> GNU gdb (GDB) 7.3.1 >> Copyright (C) 2011 Free Software Foundation, Inc. >> License GPLv3+: GNU GPL version 3 or later >> <http://gnu.org/licenses/gpl.html> >> This is free software: you are free to change and redistribute it. >> There is NO WARRANTY, to the extent permitted by law. Type "show >> copying" >> and "show warranty" for details. >> This GDB was configured as "--host=i686-pc-linux-gnu >> --target=arm-elf-linux"... >> >> crash: read error: kernel virtual address: c0c1e040 type: "first vmap_area va_start" >> >> Errors like the one above typically occur when the kernel and memory >> source >> do not match. These are the files being used: >> >> KERNEL: vmlinux308 >> DUMPFILE: Vmcore308 >> >> -- >> Unfortunately, the crash also read error and deduce the kernel and >> memory source don't match. >> >> The vmcore initialization looks like fine. and copying the dump file >> of /proc/vmcore also works fine. >> >> I couldn't know whether and why the vmcore is corrupt. > > I don't know either, but in the case above, kernel virtual address c0c1e040 > doesn't fit in the virtual address ranges declared in the vmcore header: > >> Program Headers: >> Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg >> Align >> NOTE 0x000094 0x00000000 0x00000000 0x000a8 0x000a8 0 >> LOAD 0x00013c 0xc0000000 0x00000000 0xa00000 0xa00000 RWE 0 >> LOAD 0xa0013c 0xc1e00000 0x01e00000 0x6200000 0x6200000 RWE 0 > > If you go through the exercise I showed a few messages back, i.e, look at the > kernel's vmap_area_list list_head structure by entering "p vmap_area_list", you > should see its "next" pointer containing the c0c1e040 address. But the vmcore > shows a hole between c0a00000 and c1e00000. > > Dave > >> >> >> Thanks. >> > >> > Anyway, if the crash session comes up cleanly when you apply your >> > temporary >> > solution, then clearly you've identified the problem at hand. >> > >> > Dave >> > >> > >> > -- >> > Crash-utility mailing list >> > Crash-utility@xxxxxxxxxx >> > https://www.redhat.com/mailman/listinfo/crash-utility >> >> -- >> Crash-utility mailing list >> Crash-utility@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/crash-utility > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel Thanks. The total volume of main memory is just 128MB. I will try kdump and crash utility on another ARM soc first, which has a larger main memory. -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility