Re: determining a "valid" vmcore

Andrew Hecox <ahecox@xxxxxxxxxx> · Thu, 07 Feb 2008 15:39:54 -0500

On Thu, 2008-02-07 at 14:38 -0500, Takao Indoh wrote:
> Hi Andrew,
> 
> Dave Anderson wrote:
> > Andrew Hecox wrote:
> >> On Thu, 2008-02-07 at 10:32 -0500, Dave Anderson wrote:
> >>> Andrew Hecox wrote:
> >>>> hello,
> >>>>
> >>>> I'm looking at a customer issue where diskdumpmsg is unable to read a
> >>>> vmcore file. It is not clear if this a problem with the vmcore file or
> >>>> diskdumpmsg. I can load the vmcore with crash and in my naive usage of
> >>>> it, can see no problems. However, I'm new to the tool so that doesn't
> >>>> give me a lot of confidence.
> >>>> Does anyone have any suggestions on how or if I can use crash to help
> >>>> determine if there's corruption in the vmcore file? Or any other way of
> >>>> approaching the problem?
> >>>> Thanks much,
> >>>>
> >>>> Andrew
> >>>>
> >>> I'm not sure what you expect the crash utility to do -- if it comes
> >>> up to a prompt with no error or warning messages, it means that the
> >>> ELF header contains what appears to be valid usable information,
> >>> and that the minimum kernel memory contents required to set up the
> >>> crash utility's notion of the running system are all in place.  That's
> >>> not to say that there is no chance that the vmcore contains some
> >>> corruption that was not recognized.
> >>>
> >>
> >> Thanks. Any other suggestions on how to determine if a vmcore is "valid"
> >> or is that not even a reasonable question to try and ask? The problem
> >> I'm trying to solve is described better below:
> >>
> >>> With respect to diskdumpmsg, as I understand it, it was fairly recently
> >>> changed from a perl script to a C file so that it could be run
> >>> earlier in time so as to be able to use the swap partition.  Looking
> >>> at main() in the diskdumpmsg.c file (version 1.4.1-2), there are 
> >>> numerous
> >>> error types and associated error messages.  What do you mean when you
> >>> say that "diskdumpmsg is unable to read a vmcore file"?
> >>
> >> Specifically:
> >>  - user reported a floating point exception from diskdump on startup
> >>  - the result was reproducible locally but only with their vmcore file
> >>  - fpe occurred in get_logbuf:
> >>                 log_end %= log_buf_len;
> >>  - log_buf_len had been set to 0 in read_buffer
> >>           if (!page_is_dumpable(pfn, dump->device)) {
> >>               memset(buf, 0, copy_len);
> >>           } else {
> >>  - I don't know enough to say if the page really wasn't dumpable. 
> >> static inline bool page_is_dumpable(unsigned int nr, DumpDevice *device)
> >> {
> >>   return device->dumpable_bitmap[nr>>3] & (1 << (nr & 7));
> >> }
> >>  - I wrote a patch with one way to avoid the FPE (attached) and sent it
> >> to SEG.
> >>
> >> Now I'm trying to determine if the vmcore file should be readable by
> >> diskdumpmsg. In other words, is this a problem in diskdumpmsg post-crash
> >> or a problem with the vmcore file prior to it getting to diskdumpmsg.
> >> Unfortunately, I don't understand the problem domain very well at all,
> >> hence the probably naive questions :)
> >>
> >> Any suggestions are appreciated.
> >>
> >> -Andrew
> > 
> > So it appears that the page containing the log_buf_len symbol is not
> > readable or contained in the dumpfile.  BTW, is this a compressed
> > dumpfile or an ELF formatted dumpfile?  And what "dump_level" did
> > they configure?
> > 
> > Anyway, back to the log_buf_len symbol read, what happens when you
> > enter the "log" command while in a crash session?  It attempts to
> > read that symbol immediately.
> 
> The virtual address of log_buf_len may be converted to wrong pfn.
> Could you check pfn value passed to "page_is_dumpable"?
> 

The value of pfn which is passed to page_is_dumpable is 271139.

-Andrew

> Thanks,
> Takao Indoh
> 

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility