On Thu, 2008-02-07 at 14:38 -0500, Takao Indoh wrote: > Hi Andrew, > > Dave Anderson wrote: > > Andrew Hecox wrote: > >> On Thu, 2008-02-07 at 10:32 -0500, Dave Anderson wrote: > >>> Andrew Hecox wrote: > >>>> hello, > >>>> > >>>> I'm looking at a customer issue where diskdumpmsg is unable to read a > >>>> vmcore file. It is not clear if this a problem with the vmcore file or > >>>> diskdumpmsg. I can load the vmcore with crash and in my naive usage of > >>>> it, can see no problems. However, I'm new to the tool so that doesn't > >>>> give me a lot of confidence. > >>>> Does anyone have any suggestions on how or if I can use crash to help > >>>> determine if there's corruption in the vmcore file? Or any other way of > >>>> approaching the problem? > >>>> Thanks much, > >>>> > >>>> Andrew > >>>> > >>> I'm not sure what you expect the crash utility to do -- if it comes > >>> up to a prompt with no error or warning messages, it means that the > >>> ELF header contains what appears to be valid usable information, > >>> and that the minimum kernel memory contents required to set up the > >>> crash utility's notion of the running system are all in place. That's > >>> not to say that there is no chance that the vmcore contains some > >>> corruption that was not recognized. > >>> > >> > >> Thanks. Any other suggestions on how to determine if a vmcore is "valid" > >> or is that not even a reasonable question to try and ask? The problem > >> I'm trying to solve is described better below: > >> > >>> With respect to diskdumpmsg, as I understand it, it was fairly recently > >>> changed from a perl script to a C file so that it could be run > >>> earlier in time so as to be able to use the swap partition. Looking > >>> at main() in the diskdumpmsg.c file (version 1.4.1-2), there are > >>> numerous > >>> error types and associated error messages. What do you mean when you > >>> say that "diskdumpmsg is unable to read a vmcore file"? > >> > >> Specifically: > >> - user reported a floating point exception from diskdump on startup > >> - the result was reproducible locally but only with their vmcore file > >> - fpe occurred in get_logbuf: > >> log_end %= log_buf_len; > >> - log_buf_len had been set to 0 in read_buffer > >> if (!page_is_dumpable(pfn, dump->device)) { > >> memset(buf, 0, copy_len); > >> } else { > >> - I don't know enough to say if the page really wasn't dumpable. > >> static inline bool page_is_dumpable(unsigned int nr, DumpDevice *device) > >> { > >> return device->dumpable_bitmap[nr>>3] & (1 << (nr & 7)); > >> } > >> - I wrote a patch with one way to avoid the FPE (attached) and sent it > >> to SEG. > >> > >> Now I'm trying to determine if the vmcore file should be readable by > >> diskdumpmsg. In other words, is this a problem in diskdumpmsg post-crash > >> or a problem with the vmcore file prior to it getting to diskdumpmsg. > >> Unfortunately, I don't understand the problem domain very well at all, > >> hence the probably naive questions :) > >> > >> Any suggestions are appreciated. > >> > >> -Andrew > > > > So it appears that the page containing the log_buf_len symbol is not > > readable or contained in the dumpfile. BTW, is this a compressed > > dumpfile or an ELF formatted dumpfile? And what "dump_level" did > > they configure? > > > > Anyway, back to the log_buf_len symbol read, what happens when you > > enter the "log" command while in a crash session? It attempts to > > read that symbol immediately. > > The virtual address of log_buf_len may be converted to wrong pfn. > Could you check pfn value passed to "page_is_dumpable"? > The value of pfn which is passed to page_is_dumpable is 271139. -Andrew > Thanks, > Takao Indoh > -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility