On Thu, 2008-02-07 at 16:46 -0500, Dave Anderson wrote: > Andrew Hecox wrote: > > On Thu, 2008-02-07 at 16:04 -0500, Dave Anderson wrote: > >> Andrew Hecox wrote: > >>> On Thu, 2008-02-07 at 15:38 -0500, Dave Anderson wrote: > >>>> Andrew Hecox wrote: > >>>>> I get the same: > >>>>> > >>>>> (/boot/System.map-2.6.9-67.0.1.ELhugemem) > >>>>> > >>>>> 02323bd8 d log_buf_len > >>>>> > >>>>> (/usr/lib/debug/lib/modules/2.6.9-67.0.1.ELhugemem/vmlinux) > >>>>> > >>>>> $1 = (int *) 0x2323bd8 > >>>>> > >>>>> -Andrew > >>>> So, as Takao suggested, can you dump the incoming vaddr and > >>>> resultant pfn values in diskdumpmsg.c:read_buffer()? > >>>> > >>> The vaddr value is: 36846552. > >>> > >>> -Andrew > >>> > >>>> Dave > >>>> > >>>> > >> OK, so the incoming vaddr is 36846552 is which is 0x2323bd8. > >> To get a pfn, that hugemem kernel virtual address is passed > >> through vtop() and then divided by 4096: > >> > >> static int read_buffer(DumpFile *dump, addr_t vaddr, size_t len, void *buf) > >> { > >> addr_t paddr; > >> int block_size = get_page_size(); > >> unsigned long pfn; > >> int ret; > >> size_t copy_len, offs; > >> void *page_data; > >> > >> paddr = vtop(dump, vaddr); > >> pfn = paddr / block_size; > >> offs = paddr % block_size; > >> > >> When 0x2323bd8 is run through vtop(), it simply strips off the > >> hugemem unity-map identifier: > >> > >> addr_t vtop(DumpFile *dump, addr_t vaddr) > >> { > >> if (strstr("hugemem", dump->utsname->release)) > >> return vaddr - 0x02000000L; > >> else > >> return vaddr - 0xc0000000L; > >> } > >> > >> leaving 0x323bd8 -- which gets divided by the page size of 4096, leaving > >> a pfn of 0x323. > >> > >> But you see that the pfn was 271139 (0x42323). If that is expanded > >> to a physical address it would be 0x42323000. It looks like it's > >> using the non-hugemem value in vtop(), i,e, subtracting c0000000 from > >> the incoming vaddr. In other words, 0x2323bd8 - 0xc000000 is > >> equal to 0x42323bd8. If that is divided by 4096, you get > >> the funky pfn of 271139 (0x42323). > >> > >> Print out the dump->utsname->release string in vtop(). It must > >> not contain "hugemem". > >> > > > > Dave, > > > > I get: > > > > (gdb) print dump->utsname->release > > $19 = "2.6.9-67.0.1.ELhugemem", '\0' <repeats 42 times> > > > > but then > > > > (gdb) s > > 16 return vaddr - 0xc0000000L; > > > > ! oh uh. > > > > man strstr > > > > ... > > char *strstr(const char *haystack, const char *needle); > > ... > > > > It looks like > > > > if (strstr("hugemem", dump->utsname->release)) > > > > should be: > > > > if (strstr(dump->utsname->release,"hugemem")) > > Bingo -- like the man page says: > > char *strstr(const char *haystack, const char *needle); > > > > > I patched, recompiled, tested and it works: > > > > [root@ibm-x3455-1 ~]# diskdumpmsg -f -p /var/crash/vmcore > > Jan 31 05:43:08 elabhost012 kernel: --- salvaged messages from crash > > dump start > > Jan 31 05:43:08 elabhost012 kernel: 0218b9c0 0232d363 0232d3e0 > > 0215aff6 df954fac f6db4000 eaa756c0 fffffff7 > > Jan 31 05:43:08 elabhost012 kernel: f6db4000 df954000 0215b0c0 > > df954fac 00000000 00000000 00000000 df954fc4 > > Jan 31 05:43:08 elabhost012 kernel: Call Trace: > > Jan 31 05:43:08 elabhost012 kernel: [<0220c46a>] __handle_sysrq > > +0x58/0xc6 > > Jan 31 05:43:08 elabhost012 kernel: [<0218b9c0>] write_sysrq_trigger > > +0x37/0x3e > > Jan 31 05:43:08 elabhost012 kernel: [<0215aff6>] vfs_write+0xb6/0xe2 > > Jan 31 05:43:08 elabhost012 kernel: [<0215b0c0>] sys_write+0x3c/0x62 > > Jan 31 05:43:08 elabhost012 kernel: Code: 11 02 c7 05 10 fd 44 02 00 00 > > 00 00 c7 05 38 fd 44 02 00 00 00 00 c7 05 2c fd 44 02 6e ad 87 4b 89 15 > > 28 fd 44 02 e9 8b 41 f2 ff <c6> 05 00 00 00 00 00 c3 e9 0a ff f4 ff e9 > > a2 48 f5 ff 85 d2 89 > > Jan 31 05:43:08 elabhost012 kernel: --- salvaged messages from crash > > dump end > > > > Thanks much for all the help! Should I open a bz against the issue? It > > looks like all i386 hugemem kernels would be similarly affected. > > Yep -- definitely open a BZ against component "diskdumputils". > I've opened up bz431937 for the strstr change and bz431943 for the more lack of input validation that caused the FPE. I separated them since one actually fixes an issue for production users and the other just provides a better error without making anything work. -Andrew > Dave > > -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility