On 02/20/2020 04:12 AM, HAGIO KAZUHITO(萩尾 一仁) wrote: > Hi Cascardo, > > Do you have any solution or detailed information on the failure on your kernel? > or could you try this branch? It has an additional patch on top of Pingfan's > one to avoid the false positive failure that I'm suspecting: > https://github.com/k-hagio/makedumpfile/tree/modify-mem_section-validation > > Pingfan, > Do you have an output of makedumpfile when the original failure occurs? > If you don't and it's hard to get it, no need to do so. I just would like to > add it to your patch if available. I did the test on a PowerVM. After hot removing the memory, save a raw vmcore by "cp", then run makedumpfile against the "cp" vmcore, and it will get the following error message: # makedumpfile -x vmlinux -l -d 31 vmcore vmcore.dump get_mem_section: Could not validate mem_section. get_mm_sparsemem: Can't get the address of mem_section. makedumpfile Failed. Thanks, Pingfan > > Thanks, > Kazu > > -----Original Message----- >> On 02/12/2020 12:11 PM, piliu wrote: >>> >>> >>> On 02/06/2020 11:46 AM, piliu wrote: >>>> >>>> >>>> On 02/05/2020 05:18 AM, HAGIO KAZUHITO wrote: >>>>>> -----Original Message----- >>>>>> On Tue, Feb 04, 2020 at 02:24:17PM +0800, piliu wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Sorry to reply late due to a long festival. >>>>>>> >>>>>>> I have tested this patch against v4.15 and latest kernel with small >>>>>>> modification to meet the situation we discussed here. Both work fine. >>>>>>> >>>>>>> The below is the modification of two kernel >>>>>>> >>>>>>> test1. latest kernel with two extra modification to expose the problem >>>>>>> -1.1 reverts commit 1f503443e7df8dc8366608b4d810ce2d6669827c >>>>>>> (mm/sparse.c: reset section's mem_map when fully deactivated), this >>>>>>> commit work around this bug >>>>>>> -1.2. reverts commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump: >>>>>>> write correct address of mem_section into vmcoreinfo"). This will create >>>>>>> a buggy situation as we discussed here. >>>>>>> -1.3. fix building bug due to revert >>>>>>> a0b1280368d1e91ab72f849ef095b4f07a39bbf1 >>>>>>> >>>>>>> test2. v4.15, which include both commit 83e3c48729d9 and a0b1280368d1. >>>>>>> -2.1. revert commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump: >>>>>>> write correct address of mem_section into vmcoreinfo") >>>>>>> >>>>>>> So I can not see any problem with my patch. >>>>>>> Maybe I misunderstand the discussion, but I can not see my original >>>>>>> patch will break the kernel which have 83e3c48729d9 but not a0b1280368d1. >>>>>>> >>>>>>> Thanks, >>>>>>> Pingfan >>>>>>> >>>>>> >>>>>> You also need to test the case where 83e3c48729d9 is not present at all. Can >>>>>> you test on a 4.4 kernel, for example? As far as I understand, a vanilla 4.4 >>>>>> kernel would not be dumpable with your patch. >>>>> >>>>> As far as I've tested this patch with SPARSEMEM_EXTREME vmcores below, it's OK: >>>>> - 51 vmcores of vanilla kernels (each from 2.6.36 through 5.5) on hand >>>>> - one more vanilla 4.4.0 kernel with a different config from the above >>>>> >>>>> So apparently not all vanilla 4.4 kernels are affected by the patch. >>>>> >>>> Sorry, due to touch hardware resource in our lab, I can not have a test >>>> on v4.4 on a system with hotplug memory yet. I still try to find some >>>> resource. >>>> >>>> But from the logic of this patch, it just does the following changes: >>>> -1. After memory hot-removed, either mem_section.section_mem_map = NULL >>>> or mem_section.section_mem_map without SECTION_MARKED_PRESENT, we will >>>> have mem_maps[section_nr] = mem_map = NOT_MEMMAP_ADDR, so later it will >>>> be skipped. >>>> -2. If not populated, mem_section.section_mem_map = NULL. It can follow >>>> the same handling of hot-removed, and be skipped during parsing. >>>> >>>> And in v4.4 sparse_remove_one_section() just assigns ms->section_mem_map >>>> = 0, which can not be violated by the above changes. >> Ping. As all of us can not reproduce this bug by v4.4 kernel, further >> more, there is no code analysis, which persuades us this patch will >> break the makedumpfile on any kernel version. >> >> Could this better-to-have patch be accepted? >> >> Thanks, >> Pingfan >>> Last night, I got a machine to test this scene. After applying my patch >>> makedumpfile can still work with v4.4 kernel. >>> >>> Thanks, >>> Pingfan >>> > _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec