Re: add support for incomplete elf dump file

Dave Anderson <anderson@xxxxxxxxxx> · Fri, 24 Oct 2014 10:05:57 -0400 (EDT)

----- Original Message -----
> On 10/23/2014 08:10 PM, Dave Anderson wrote:
> >
> >
> > ----- Original Message -----
> >> Hello Dave,
> >>
> >> On 10/22/2014 09:49 PM, Dave Anderson wrote:
> >>>
> >>> Hello Wenjian,
> >>>
> >>> First -- please send patches as email attachments.  Even if I cut-and-paste your
> >>> patch from the crash-utility archives (which normally works), there are
> >>> still no tabs in your patch.
> >>
> >> Appreciate your suggestion.
> >>
> >>> Now, with respect to the patch, it's not clear to me what would happen if you
> >>> make no changes to check_dumpfile_size()?
> >>>
> >>> It seems to me that a read error would occur regardless whether you change
> >>> the PT_LOAD-related contents or not.
> >>>
> >>> If no changes are made:
> >>>
> >>>     If a readmem() of a truncated page is attempted, read_netdump() would
> >>>     calculate an offset based upon the original PT_LOAD contents, but then
> >>>     the subsequent read() would fail, and would return a READ_ERROR.
> >>>
> >>> If your patch is applied:
> >>>
> >>>     If a readmem() of a truncated page is attempted, read_netdump() would not
> >>>     be able to calculate an offset, and would return a READ_ERROR.
> >>>
> >>> What's the difference?  Why even bother making the changes?
> >>>
> >> If my patch is applied:
> >> 	If a readmem() of a truncated page is attempted,
> >>
> >> 	(pls->zero_fill&&  (paddr>= pls->phys_end)&&  (paddr<  pls->zero_fill)),
> >> 	,his will be right. So read_netdump() will fill bufptr with zero and
> >> 	return cnt.
> >>
> >> In my patch, information of PT_LOAD segment is modified, so that the segment will
> >> not go beyond the file size. And use zero to replace the lost part.
> >>
> >> Previously, we intend to do this work in makedumpfile by modifying the PT_LOAD
> >> header of the ELF dump file. But kumagai atsushi thought it's not good to make the
> >> irreversible change. So we think do it in crash maybe a better choice.
> >>
> >>> I also don't understand what the difference between "truncated" and "incomplete"
> >>> is?  Why did you separate the messages into two?
> >>>
> >> At first, I thought the difference is the incomplete flag.
> >> Now, I think it's not necessary to distinguish them.
> >>
> >>
> >>> Anyway, I had already created a patch in preparation for the changes to
> >>> makedumpfile for ELF and compressed kdump vmcores.  The patch will apply
> >>> to the current github master branch.  Please apply it alone, and tell me
> >>> what happens.
> >>
> >> The patch can show the incomplete information, but will be interrupted by the
> >> READ_ERROR (we don't change the PT_LOAD headers in makedumpfile any more).
> >
> > Well, it *should* be interrupted.  What's worse, "interruption" or the receipt
> > of completely bogus zero-filled data?
> >
> > This whole scheme is even crazier than I thought.  I'm going to have
> > to make the initialization-time warning message even more ominous than
> > I already have it.
> >
> Actually, I haven't realized it until doing the work in crash.
> I think which is better or worse depends on what people want to do.
> If incomplete flag exists, it may indicate people want to analyse the file,
> and returning zero-filled data is better. Otherwise, "interruption" may be
> expected.

Right, this needs much more discussion... (see below)

> 
> > But I still don't think it's necessary to modify the original PT_LOAD
> > contents.  If read_netdump() finds a legitimate offset, but that offset
> > is beyond the end of the incomplete/truncated file, it can just check the
> > INCOMPLETE flag and return a zero-filled buffer if it's set.   Wouldn't
> > that work?
> 
> It does work. And now, the question turns to be which way is better,
> first:   do it in check_dumpfile_size() by modifying relevant information
> 	 (not modifying the information in ELF file)
> 
> second:  do it in read_netdump() by return zero-filled data with
>           checking offset and flag
> isn't it?

Correct, check_dumpfile_size() is meant to be informational in nature.
If zero-filled data gets returned, it should really should be done
in read_netdump().

With the patch I that proposed, you can simplify this part of your patch:

+                       unsigned int e_flag = (NULL == nd->elf64) ?
+                                                (nd->elf32)->e_flags : (nd->elf64)->e_flags;
+                       int status = e_flag & DUMP_ELF_INCOMPLETE;
+                       if (status && read_ret >= 0) {

to just this:
                        if (is_incomplete_dump() && (read_ret >= 0)) { 

But what about compressed kdumps?  In the case of DUMP_DH_COMPRESSED_INCOMPLETE
dumpfiles, will makedumpfile manipulate the bitmaps and the page_desc.offset values
of all "missing" pages to point to the special "pd_zero" page?  Will it be
impossible to distinguish between legitimate zero-filled pages and "missing" pages?

I suppose that if that is true, then the behavior should be the same for
both ELF and compressed kdumps, which would be to return zero-filled data.

On the other hand...

What's bothersome about automatically returning zero-filled data is that,
currently, if an ELF vmcore was originally created correctly, but gets truncated
*after* the fact, for example, when downloading it from a customer site,
check_dumpfile_size() will report the error.  But when data from the truncated
region gets read, a read error is returned. 

If zero-filled data is silently returned, I can imagine that there will be all 
kinds of errors that will occur -- both display/content errors, but it could
easily cause SIGSEGV's in the crash utility when it tries to use zero-filled
data thinking that it's legitimate.  How would the user even know whether it's
because the page was truncated or not?

The crash utility already has a --zero_excluded flag for returning zero-filled
memory if by chance a *legitimately* excluded/filtered page is read.  It's
hardly ever used, because those pages should never be required.  But I
wonder whether it could be re-purposed in this case, where by default an error
would be returned when reading missing pages, and zero-filled only if it is
explicitly requested with the --zero_excluded command line option?  Or would
that be impossible with DUMP_DH_COMPRESSED_INCOMPLETE compressed kdumps?

Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility