On Wed, Feb 19, 2020 at 08:12:41PM +0000, HAGIO KAZUHITO(萩尾 一仁) wrote: > Hi Cascardo, > > Do you have any solution or detailed information on the failure on your kernel? > or could you try this branch? It has an additional patch on top of Pingfan's > one to avoid the false positive failure that I'm suspecting: > https://github.com/k-hagio/makedumpfile/tree/modify-mem_section-validation > > Pingfan, > Do you have an output of makedumpfile when the original failure occurs? > If you don't and it's hard to get it, no need to do so. I just would like to > add it to your patch if available. > > Thanks, > Kazu Will try the said branch. Sorry that I couldn't work this out before. I was trying to reproduce this today, but end up in a rabbit hole when qemu+KVM started failing for unrelated reasons after an upgrade. I'll try to come up with some new results by tomorrow later in the day. Thanks. Cascardo. > > -----Original Message----- > > On 02/12/2020 12:11 PM, piliu wrote: > > > > > > > > > On 02/06/2020 11:46 AM, piliu wrote: > > >> > > >> > > >> On 02/05/2020 05:18 AM, HAGIO KAZUHITO wrote: > > >>>> -----Original Message----- > > >>>> On Tue, Feb 04, 2020 at 02:24:17PM +0800, piliu wrote: > > >>>>> Hi, > > >>>>> > > >>>>> Sorry to reply late due to a long festival. > > >>>>> > > >>>>> I have tested this patch against v4.15 and latest kernel with small > > >>>>> modification to meet the situation we discussed here. Both work fine. > > >>>>> > > >>>>> The below is the modification of two kernel > > >>>>> > > >>>>> test1. latest kernel with two extra modification to expose the problem > > >>>>> -1.1 reverts commit 1f503443e7df8dc8366608b4d810ce2d6669827c > > >>>>> (mm/sparse.c: reset section's mem_map when fully deactivated), this > > >>>>> commit work around this bug > > >>>>> -1.2. reverts commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump: > > >>>>> write correct address of mem_section into vmcoreinfo"). This will create > > >>>>> a buggy situation as we discussed here. > > >>>>> -1.3. fix building bug due to revert > > >>>>> a0b1280368d1e91ab72f849ef095b4f07a39bbf1 > > >>>>> > > >>>>> test2. v4.15, which include both commit 83e3c48729d9 and a0b1280368d1. > > >>>>> -2.1. revert commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump: > > >>>>> write correct address of mem_section into vmcoreinfo") > > >>>>> > > >>>>> So I can not see any problem with my patch. > > >>>>> Maybe I misunderstand the discussion, but I can not see my original > > >>>>> patch will break the kernel which have 83e3c48729d9 but not a0b1280368d1. > > >>>>> > > >>>>> Thanks, > > >>>>> Pingfan > > >>>>> > > >>>> > > >>>> You also need to test the case where 83e3c48729d9 is not present at all. Can > > >>>> you test on a 4.4 kernel, for example? As far as I understand, a vanilla 4.4 > > >>>> kernel would not be dumpable with your patch. > > >>> > > >>> As far as I've tested this patch with SPARSEMEM_EXTREME vmcores below, it's OK: > > >>> - 51 vmcores of vanilla kernels (each from 2.6.36 through 5.5) on hand > > >>> - one more vanilla 4.4.0 kernel with a different config from the above > > >>> > > >>> So apparently not all vanilla 4.4 kernels are affected by the patch. > > >>> > > >> Sorry, due to touch hardware resource in our lab, I can not have a test > > >> on v4.4 on a system with hotplug memory yet. I still try to find some > > >> resource. > > >> > > >> But from the logic of this patch, it just does the following changes: > > >> -1. After memory hot-removed, either mem_section.section_mem_map = NULL > > >> or mem_section.section_mem_map without SECTION_MARKED_PRESENT, we will > > >> have mem_maps[section_nr] = mem_map = NOT_MEMMAP_ADDR, so later it will > > >> be skipped. > > >> -2. If not populated, mem_section.section_mem_map = NULL. It can follow > > >> the same handling of hot-removed, and be skipped during parsing. > > >> > > >> And in v4.4 sparse_remove_one_section() just assigns ms->section_mem_map > > >> = 0, which can not be violated by the above changes. > > Ping. As all of us can not reproduce this bug by v4.4 kernel, further > > more, there is no code analysis, which persuades us this patch will > > break the makedumpfile on any kernel version. > > > > Could this better-to-have patch be accepted? > > > > Thanks, > > Pingfan > > > Last night, I got a machine to test this scene. After applying my patch > > > makedumpfile can still work with v4.4 kernel. > > > > > > Thanks, > > > Pingfan > > > > _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec