-----Original Message----- > On 02/20/2020 04:12 AM, HAGIO KAZUHITO wrote: > > Hi Cascardo, > > > > Do you have any solution or detailed information on the failure on your kernel? > > or could you try this branch? It has an additional patch on top of Pingfan's > > one to avoid the false positive failure that I'm suspecting: > > https://github.com/k-hagio/makedumpfile/tree/modify-mem_section-validation > > > > Pingfan, > > Do you have an output of makedumpfile when the original failure occurs? > > If you don't and it's hard to get it, no need to do so. I just would like to > > add it to your patch if available. > I did the test on a PowerVM. After hot removing the memory, save a raw > vmcore by "cp", then run makedumpfile against the "cp" vmcore, and it > will get the following error message: > # makedumpfile -x vmlinux -l -d 31 vmcore vmcore.dump > get_mem_section: Could not validate mem_section. > get_mm_sparsemem: Can't get the address of mem_section. > > makedumpfile Failed. Thank you, will add it to your patch. and a bit of explanation for the branch above.. > >>>> But from the logic of this patch, it just does the following changes: > >>>> -1. After memory hot-removed, either mem_section.section_mem_map = NULL > >>>> or mem_section.section_mem_map without SECTION_MARKED_PRESENT, we will > >>>> have mem_maps[section_nr] = mem_map = NOT_MEMMAP_ADDR, so later it will > >>>> be skipped. > >>>> -2. If not populated, mem_section.section_mem_map = NULL. It can follow > >>>> the same handling of hot-removed, and be skipped during parsing. > >>>> > >>>> And in v4.4 sparse_remove_one_section() just assigns ms->section_mem_map > >>>> = 0, which can not be violated by the above changes. > >> Ping. As all of us can not reproduce this bug by v4.4 kernel, further > >> more, there is no code analysis, which persuades us this patch will > >> break the makedumpfile on any kernel version. I'm not clear what causes it on the 4.4 kernel because of lack of information. But at least your patch relaxes the condition to recognize data of an address as a valid mem_section. So I'm concerned that both of the first and second validate_mem_section() may return true by accident or something. If it is not caused by a patch in the 4.4 kernel, for example, just data location causes it, it may occur on upstream kernels. Although we cannot reproduce it so far. Whatever causes it, in the first place, upstream kernels with vmcoreinfo don't need the second validate_mem_section(). It's almost for some downstream kernels and has a risk that causes problem like this. So I'm thinking that it might be safer to change it to the if(!ret) way on the branch above with your patch. Thanks, Kazu _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec