Re: [PATCH] makedumpfile: cope with not-present mem section

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 20, 2020 at 03:40:12PM +0000, HAGIO KAZUHITO(萩尾 一仁) wrote:
> -----Original Message-----
> > On 02/20/2020 04:12 AM, HAGIO KAZUHITO wrote:
> > > Hi Cascardo,
> > >
> > > Do you have any solution or detailed information on the failure on your kernel?
> > > or could you try this branch?  It has an additional patch on top of Pingfan's
> > > one to avoid the false positive failure that I'm suspecting:
> > > https://github.com/k-hagio/makedumpfile/tree/modify-mem_section-validation
> > >
> > > Pingfan,
> > > Do you have an output of makedumpfile when the original failure occurs?
> > > If you don't and it's hard to get it, no need to do so.  I just would like to
> > > add it to your patch if available.
> > I did the test on a PowerVM. After hot removing the memory, save a raw
> > vmcore by "cp", then run makedumpfile against the "cp" vmcore, and it
> > will get the following error message:
> > # makedumpfile -x vmlinux -l -d 31 vmcore vmcore.dump
> > get_mem_section: Could not validate mem_section.
> > get_mm_sparsemem: Can't get the address of mem_section.
> > 
> > makedumpfile Failed.
> 
> Thank you, will add it to your patch.
> 
> and a bit of explanation for the branch above..
> 
> > >>>> But from the logic of this patch, it just does the following changes:
> > >>>> -1. After memory hot-removed, either mem_section.section_mem_map = NULL
> > >>>> or mem_section.section_mem_map without SECTION_MARKED_PRESENT, we will
> > >>>> have mem_maps[section_nr] = mem_map = NOT_MEMMAP_ADDR, so later it will
> > >>>> be skipped.
> > >>>> -2. If not populated, mem_section.section_mem_map = NULL. It can follow
> > >>>> the same handling of hot-removed, and be skipped during parsing.
> > >>>>
> > >>>> And in v4.4 sparse_remove_one_section() just assigns ms->section_mem_map
> > >>>> = 0, which can not be violated by the above changes.
> > >> Ping. As all of us can not reproduce this bug by v4.4 kernel, further
> > >> more, there is no code analysis, which persuades us this patch will
> > >> break the makedumpfile on any kernel version.
> 
> I'm not clear what causes it on the 4.4 kernel because of lack of information.
> But at least your patch relaxes the condition to recognize data of an address
> as a valid mem_section.  So I'm concerned that both of the first and second
> validate_mem_section() may return true by accident or something.  If it is not
> caused by a patch in the 4.4 kernel, for example, just data location causes it,
> it may occur on upstream kernels.  Although we cannot reproduce it so far.
> 
> Whatever causes it, in the first place, upstream kernels with vmcoreinfo don't
> need the second validate_mem_section().  It's almost for some downstream
> kernels and has a risk that causes problem like this.  So I'm thinking that
> it might be safer to change it to the if(!ret) way on the branch above
> with your patch.
> 
> Thanks,
> Kazu
> 

Hi, Kazu.

For now, I would keep Pingfan's patch as is. I decided to test some other
kernels, like 3.16 and 4.9 without commit
83e3c48729d9ebb7af5a31a504f3fd6aff0348c4.

So, they would be valid for the first iteration and invalid for the second,
just like Ubuntu's 4.4 kernel. As I couldn't reproduce the problem, I
investigated further and realized I was testing without your commit for
makedumpfile: 8f1aafa1643532ece86cba22f2187d0f42fb7ca3 ("PATCH Fix
validate_mem_section()").

With that one, all of Ubuntu's 4.4 kernel and Debian's 3.16 and 4.9 kernels
dump correctly. Without that one, all fail, and I suppose it would be easily
reproducible.

So, in effect, we are looking for any entry with the present bit, and then
checking it is valid kernel address. That seems to work just fine.

As you mention, the only case we do the second check is for some downstream
kernels (though I would argue we should care about those the most), but at
least from the Ubuntu side, those should not be around in the field anymore,
and, by default, those should be the rare exception anyway. So, I agree with
your follow-up commit in your branch, as it also simplifies the code a lot.

If you care, feel free to add my Ack to the two patches.

Acked-by: Thadeu Lima de Souza Cascardo <cascardo@xxxxxxxxxxxxx>

Thanks for your patience.
Thadeu Cascardo.

_______________________________________________
kexec mailing list
kexec@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/kexec




[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux