On Tue, Feb 04, 2020 at 02:24:17PM +0800, piliu wrote: > Hi, > > Sorry to reply late due to a long festival. > > I have tested this patch against v4.15 and latest kernel with small > modification to meet the situation we discussed here. Both work fine. > > The below is the modification of two kernel > > test1. latest kernel with two extra modification to expose the problem > -1.1 reverts commit 1f503443e7df8dc8366608b4d810ce2d6669827c > (mm/sparse.c: reset section's mem_map when fully deactivated), this > commit work around this bug > -1.2. reverts commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump: > write correct address of mem_section into vmcoreinfo"). This will create > a buggy situation as we discussed here. > -1.3. fix building bug due to revert > a0b1280368d1e91ab72f849ef095b4f07a39bbf1 > > test2. v4.15, which include both commit 83e3c48729d9 and a0b1280368d1. > -2.1. revert commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump: > write correct address of mem_section into vmcoreinfo") > > So I can not see any problem with my patch. > Maybe I misunderstand the discussion, but I can not see my original > patch will break the kernel which have 83e3c48729d9 but not a0b1280368d1. > > Thanks, > Pingfan > You also need to test the case where 83e3c48729d9 is not present at all. Can you test on a 4.4 kernel, for example? As far as I understand, a vanilla 4.4 kernel would not be dumpable with your patch. Thanks. Cascardo. > On 01/29/2020 03:33 AM, Thadeu Lima de Souza Cascardo wrote: > > On Tue, Jan 28, 2020 at 05:03:12PM +0000, HAGIO KAZUHITO(萩尾 一仁) wrote: > >> Hi Cascardo, > >> > >>> -----Original Message----- > >>> On Mon, Jan 27, 2020 at 02:04:54PM -0300, Thadeu Lima de Souza Cascardo wrote: > >>>> Sorry for taking too long to respond, as I was on vacation. > >>>> > >>>> The kernels that had commit 83e3c48729d9, but not commit a0b1280368d1, are > >>>> not supported anymore. In a way that it's even hard for me to test them. > >>>> > >>>> However, I managed to test it, and those two lines are definitively needed > >>>> to dump a VM running such a kernel. Is removing them really needed to fix > >>>> this issue? > >>>> > >>>> Otherwise, I would rather keep them. > >>>> > >>>> Thanks. > >>>> Cascardo. > >>> > >>> By the way, I was too fast in sending this. We really need to keep those lines > >>> as makedumpfile will fail to dump a 4.4 kernel with this patch as is. > >> > >> Is that Ubuntu 4.4 kernel which has 83e3c48729d9 and not a0b1280368d1? > >> Could you elaborate on how it fails? > > > > No, it doesn't have either, so my guess is it would fail on upstream 4.4 as > > well, so anything that doesn't have 83e3c48729d9. > > > > That's what I get on that 4.4 kernel (4.4.0-171-generic): > > > > # ./makedumpfile /proc/vmcore ../dump > > get_mem_section: Could not validate mem_section. > > get_mm_sparsemem: Can't get the address of mem_section. > > > > makedumpfile Failed. > > # > > > > So, now, I have a better grasp of the whole logic, and understand why it fails > > with this patch. > > > > So, we need to either interpret the mem_section as a pointer to the array of a > > pointer to the pointer to the array. The only case the second option is valid > > is when sparse_extreme is on, so we don't even need to check the second case > > when it's off. > > > > Then, we check that interpreting it either way is valid. If it's valid in both > > interpretations, we can't decide which to use, and will fail. So far, we > > haven't seen any case in the field where that would accidentally happen. But in > > case it does, we should rather fail to dump and fallback to copying, than > > creating a bogus compressed dump. > > > > When this patch is applied, a kernel which does not have 83e3c48729d9, and > > thus, has mem_section as a direct pointer to the array, it so happens that we > > don't detect the pointer to pointer to the array case as invalid, thus failing > > to dump. > > > > The way we validate is that the mem_maps should either have the PRESENT bit or > > be NULL. Now, that assumption may be invalid, and we would need to do the > > validation some other way. I can test the cases where that assumption is > > invalid in a 4.4 kernel and see how to fix this in a satisfactory way. > > > > Going through the code once again, I don't see how the second section of the > > patch would be correct by itself too. I will think it through a little more and > > see if I can come up with a solution. > > > > Regards. > > Cascardo. > > > >> > >> I'm thinking that Pingfan's thought may help: > >>>> I think it could be if/else, no need to call twice. > >> > >> Thanks, > >> Kazu > >> > >>> > >>> Cascardo. > > > _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec