----- Original Message ----- > > > > Hi Dave: > > thank you very much for your detail answer, this really helpful. > please see my inline words. thanks. > > > > Date: Thu, 17 Jan 2013 14:17:36 -0500 > > From: anderson@xxxxxxxxxx > > To: crash-utility@xxxxxxxxxx > > Subject: Re: questions about crash utility > > > The fact that crash gets as far as it does at least means that the > > ELF header you've created was deemed acceptable as an ARM vmcore. > > However, the error messages re: "cpu_present_mask indicates..." and > > "cannot determine base kernel version" indicate that the data > > that was read from the vmcore was clearly not the correct data. > > > > The "cpu_present_mask" value that it read contained too > > many bits -- presuming that the 32-bit ARM processor is > > still limited to only 4 cpus. (looks like upstream that > > CONFIG_NR_CPUS is still 2 in the arch/arm/configs files.) > > > > But more indicative of the wrong data being read is the second > > "cannot determine base kernel version" message, which was generated > > after it read the kernel's "init_uts_ns" uts_namespace structure. > > After reading it, it sees that the "release" string contains > > non-ASCII data, whereas it should contain the kernel version: > > > > crash> p init_uts_ns > > init_uts_ns = $3 = { > > kref = { > > refcount = { > > counter = 2 > > } > > }, > > name = { > > sysname = > > "Linux\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", > > nodename = > > "phenom-01.lab.bos.redhat.com\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", > > release = > > "2.6.32-313.el6.x86_64\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", > > version = "#1 SMP Thu Sep 27 16:25:19 EDT > > 2012\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", > > machine = > > "x86_64\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", > > domainname = > > "(none)\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" > > } > > } > > crash> > > > > So it appears that you're reading data from the wrong > > locations in the dumpfile. You should be able to verify > > that by bringing up the crash session with the --minimal > > flag like this: > > > > $ crash --minimal vmlinux vmcore > > > > That will bypass most of the initialization, including all > > readmem() calls of the vmcore. Then do this: > > > > crash> rd linux_banner 20 > > ffffffff818000a0: 65762078756e694c 2e33206e6f697372 Linux version > > 3. > > ffffffff818000b0: 63662e312d312e35 365f3638782e3731 5.1-1.fc17.x86_6 > > ffffffff818000c0: 626b636f6d282034 69756240646c6975 4(mockbuild@bui > > ffffffff818000d0: 2e33322d6d76646c 6465662e32786870 ldvm-23.phx2.fed > > ffffffff818000e0: 656a6f727061726f 202967726f2e7463 oraproject.org) > > ffffffff818000f0: 7265762063636728 372e34206e6f6973 (gcc version 4.7 > > ffffffff81800100: 303231303220302e 6465522820373035 .0 20120507 > > (Red > > ffffffff81800110: 372e342074614820 47282029352d302e Hat 4.7.0-5) (G > > ffffffff81800120: 3123202920294343 75685420504d5320 CC) ) #1 SMP Thu > > ffffffff81800130: 3120392067754120 2033343a30353a37 Aug 9 17:50:43 > > crash> rd -a linux_banner > > ffffffff818000a0: Linux version 3.5.1-1.fc17.x86_64 (mockbuild@buildvm-23.phx2 > > ffffffff818000dc: .fedoraproject.org) (gcc version 4.7.0 20120507 (Red Hat 4.7 > > ffffffff81800118: .0-5) (GCC) ) #1 SMP Thu Aug 9 17:50:43 UTC 2012 > > crash> > > > > I'm guessing that you will not see a string starting with "Linux version" > > with your dumpfile as shown above. > > > > If that's the case, then it's clear that the readmem() function is ultimately > > reading from the wrong vmcore file offset. > > > > Here's what you can try doing. Taking the linux_banner example above, > > you can check where in the dumpfile it's reading from by setting the debug > > flag, before doing a simple read -- like this example on an ARM dumpfile: > > > > crash> set debug 8 > > debug: 8 > > crash> rd linux_banner > > <addr: c033ea10 count: 1 flag: 488 (KVADDR)> > > <readmem: c033ea10, KVADDR, "32-bit KVADDR", 4, (FOE), ff94f048> > > <read_kdump: addr: c033ea10 paddr: 33ea10 cnt: 4> > > read_netdump: addr: c033ea10 paddr: 33ea10 cnt: 4 offset: 33f088 > > c033ea10: 756e694c Linu > > crash> > > > > The linux_banner is at virtual address c033ea10 (addr). First it gets translated > > into physical address 33ea10 (paddr). Then that paddr is translated into the > > vmcore file offset of 33f088. It lseeks to vmcore file offset 33f088 and > > reads 4 bytes, which contain "756e694c", or the first 4 bytes of the > > "Linux version ..." string. > > > > Note that if I subtract the physical address from vmcore file offset > > I get this: > > > > crash> eval 33f088 - 33ea10 > > hexadecimal: 678 > > decimal: 1656 > > octal: 3170 > > binary: 00000000000000000000011001111000 > > crash> > > > > which would put physical address 0 at a vmcore file offset of 0x678, and > > therefore implying that that the ELF header comprises the first 0x678 bytes. > > And looking at the vmcore, that can be verified: > > > > yes you are right, here i get the result as below: > crash> set debug 8 > debug: 8 > crash> rd linux_banner > <addr: c065a071 count: 1 flag: 488 (KVADDR)> > <readmem: c065a071, KVADDR, "32-bit KVADDR", 4, (FOE), ffdf297c> > <read_kdump: addr: c065a071 paddr: 85a071 cnt: 4> > read_netdump: addr: c065a071 paddr: 85a071 cnt: 4 offset: 65a0e5 > c065a071: 03e59130 0... > > the virtual address is 0xc065a071 , and the physical address is > 0x85a071 , and the offset is 0x65a0e5. > my elf header is 116 bytes long, 0x65a0e5 - 116=0x65A071, which has a > gap 0x00200000 with the physical address 0x85a071. > > > > $ readelf -a vmcore > > ELF Header: > > Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 > > Class: ELF32 > > Data: 2's complement, little endian > > Version: 1 (current) > > OS/ABI: UNIX - System V > > ABI Version: 0 > > Type: CORE (Core file) > > Machine: ARM > > Version: 0x1 > > Entry point address: 0x0 > > Start of program headers: 52 (bytes into file) > > Start of section headers: 0 (bytes into file) > > Flags: 0x0 > > Size of this header: 52 (bytes) > > Size of program headers: 32 (bytes) > > Number of program headers: 3 > > Size of section headers: 0 (bytes) > > Number of section headers: 0 > > Section header string table index: 0 > > > > There are no sections in this file. > > > > There are no sections to group in this file. > > > > Program Headers: > > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align > > NOTE 0x000094 0x00000000 0x004e345c 0x005e4 0x005e4 0 > > LOAD 0x000678 0xc0000000 0x00000000 0x5600000 0x5600000 RWE 0 > > LOAD 0x5600678 0xc5700000 0x05700000 0x100000 0x100000 RWE 0 > > ... > > > > Note that the "Offset" value of the first PT_LOAD segment has a file offset > > value of 0x678. > > > > here i got the result as below: > Program Headers: > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align > NOTE 0x000000 0x00000000 0x00000000 0x00000 0x00000 0 > LOAD 0x000074 0xc0000000 0x00200000 0x2fe00000 0x2fe00000 RWE 0 > > so the problem is i don't understand the elf header meaning > accurately. if i modify code as below, everything is ok for me: > > offset += sizeof(struct elf_phdr); > phdr->p_offset = offset+0x00200000; > phdr->p_vaddr = 0xc0000000; > phdr->p_paddr = 0x00200000; > phdr->p_filesz = phdr->p_memsz = = MEMSIZE-0x00200000; > > > although my modification can make crash utility work well, i want to > know exactly whether i am doing the right thing. > 1. our platform has the ddr address from physical address 0x0. > 2. when compiling Linux kernel, our platform set in .config file: > CONFIG_PHYS_OFFSET=0x00200000 > 3. when Kernel crash, all ddr content will be dumped, from address > 0x0~768MB. but kernel data starts from 0x00200000 actually. > > my questions are: > 1. whether my setting of ELF header is correct this time? the offset, > paddr, and p_memsz? I'm not really sure. Even though you've got it to work OK, I don't understand your new phdr->p_offset and phdr->p_filesz/phdr->p_memsz settings. The phdr->p_offset value typically points to the beginning of the physical memory segment, which in your case, would be at physical address 0x0 at file offset 0x74. And the phdr->p_filesz/phdr->p_memsz values are typically equal to the full size of the physical memory segment (MEMSIZE). I only have one ELF ARM dumpfile sample, but it does not have any physical offset: crash> vtop c0000000 VIRTUAL PHYSICAL c0000000 0 PAGE DIRECTORY: c0004000 PGD: c0007000 => 1140e PMD: c0007000 => 1140e PAGE: 0 (1MB) PAGE PHYSICAL MAPPING INDEX CNT FLAGS c042d000 0 0 0 0 80000 crash> Does "vtop c0000000" work as expected on your vmcore? Also, can you read the last physical page of memory? For example, on my ARM dump, I can check that by doing this: crash> kmem -p | tail -5 c04dcf60 57fb000 0 0 1 400 c04dcf80 57fc000 0 0 1 400 c04dcfa0 57fd000 0 0 1 400 c04dcfc0 57fe000 0 0 1 400 c04dcfe0 57ff000 0 0 1 400 crash> rd -p 57ff000 57ff000: ef9f0000 .... crash> Also, can you confirm that your kernel's symbol list starts at c0000000, i.e., something like this: crash> sym -l c0004000 (A) swapper_pg_dir c0008000 (t) .init c0008000 (T) __init_begin c0008000 (T) _sinittext c0008000 (T) _stext c0008000 (T) stext c0008040 (t) __create_page_tables c00080e4 (t) __enable_mmu_loc c00080f0 (t) __error_a c00080f4 (t) __lookup_machine_type c0008128 (t) __lookup_machine_type_data ... I just want to make sure that the kernel symbols actually start at c000000, and not c2000000. > 2. i am wondering how does crash utility translate virtual address to > physical address before and after it get the kernel page table? > before get kernel page table, does it calculate as : (virtual_addr - > p_vaddr + p_paddr) ? after get kernel page table, does it walk > through the page table and find out the real physical address > accordingly? For kernel unity-mapped kernel virtual addresses, it's not necessary to walk the page tables. It simply does this: #define VTOP(X) \ ((unsigned long)(X)-(machdep->kvbase)+(machdep->machspec->phys_base)) You can check your machdep->kvbase and machdep->machspec->phys_base values by entering "help -m", for example: crash> help -m | grep -e kvbase -e phys_base kvbase: c0000000 phys_base: 0 crash> Certainly vmalloc (and user-space) virtual addresses require a page table walkthough, but the arm_kvtop() function does this: static int arm_kvtop(struct task_context *tc, ulong kvaddr, physaddr_t *paddr, int verbose) { if (!IS_KVADDR(kvaddr)) return FALSE; if (!vt->vmalloc_start) { *paddr = VTOP(kvaddr); return TRUE; } if (!IS_VMALLOC_ADDR(kvaddr)) { *paddr = VTOP(kvaddr); <=== unity-mapped kernel virtual addresses if (!verbose) return TRUE; } return arm_vtop(kvaddr, (ulong *)vt->kernel_pgd[0], paddr, verbose); } and where vmalloc addresses fall through and arm_vtop() is called to walk the page tables. However, you can translate unity-mapped addresses using the kernel page tables with the "vtop" command, as shown in the "vtop c000000" example above. > 3. my real purpose is to get the ftrace content from dump file by > crash utility , but seem the command trace is not for this case, do > i need to compile the extension "trace" of crash utility? is there > any guide to follow? That's correct. You can do this: $ wget http://people.redhat.com/anderson/crash-6.1.2.tar.gz ... $ tar xvzmf crash-6.1.2.tar.gz ... $ cd crash-6.1.2 $ make ... $ make extensions ... $ ./crash vmlinux vmcore ... crash> extend trace.so ./extensions/trace.so: shared object loaded crash> help trace ... Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility