On Thu, Jan 23, 2020 at 05:07:50PM +0000, HAGIO KAZUHITO(萩尾 一仁) wrote: > Hi Pingfan, > > Sorry, I had been busy, then was looking into its history a bit. > > > -----Original Message----- > > On 01/20/2020 04:59 PM, Baoquan He wrote: > > > On 01/20/20 at 09:33am, David Hildenbrand wrote: > > >> > > >> > > >>> Am 20.01.2020 um 03:25 schrieb Pingfan Liu <piliu@xxxxxxxxxx>: > > >>> > > >>> After kernel commit ba72b4c8cf60 ("mm/sparsemem: support sub-section > > >>> hotplug"), when hot-removed, section_mem_map is still encoded with section > > >>> start pfn, not NULL. This break the current makedumpfile. > > Could you add kernel version which this started and error message, > if possible, for someone hitting this issue to find the patch. > > > >>> > > >>> Whatever section_mem_map coding info after hot-removed, it is reliable > > >>> just to work on SECTION_MARKED_PRESENT bit. Fixing makedumpfile by this > > >>> way. > > >>> > > >>> Signed-off-by: Pingfan Liu <piliu@xxxxxxxxxx> > > >>> To: kexec@xxxxxxxxxxxxxxxxxxx > > >>> Cc: Kazuhito Hagio <k-hagio@xxxxxxxxxxxxx> > > >>> Cc: Baoquan He <bhe@xxxxxxxxxx> > > >>> Cc: David Hildenbrand <david@xxxxxxxxxx> > > >>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > >>> Cc: Dan Williams <dan.j.williams@xxxxxxxxx> > > >>> Cc: Oscar Salvador <osalvador@xxxxxxx> > > >>> Cc: Michal Hocko <mhocko@xxxxxxxxxx> > > >>> Cc: Qian Cai <cai@xxxxxx> > > >>> --- > > >>> makedumpfile.c | 6 +----- > > >>> 1 file changed, 1 insertion(+), 5 deletions(-) > > >>> > > >>> diff --git a/makedumpfile.c b/makedumpfile.c > > >>> index e290fbd..ab40a58 100644 > > >>> --- a/makedumpfile.c > > >>> +++ b/makedumpfile.c > > >>> @@ -3406,8 +3406,6 @@ section_mem_map_addr(unsigned long addr, unsigned long *map_mask) > > >>> map = ULONG(mem_section + OFFSET(mem_section.section_mem_map)); > > >>> mask = SECTION_MAP_MASK; > > >>> *map_mask = map & ~mask; > > >>> - if (map == 0x0) > > >>> - *map_mask |= SECTION_MARKED_PRESENT; > > This should be a trick to let map==0x0 survive the logic > > if (!(map_mask & SECTION_MARKED_PRESENT)) { > > return FALSE; > > } > > in validate_mem_section(). > > >> > > >> Why was that added in the first place? This looks like dome compat handling to me. Are we sure we can > > remove it? > > The logic introduced by commit 14876c45aef ("[PATCH makedumpfile] handle > > mem_section as either a pointer or an array") > > Combining section_mem_map_addr() and the following logic in > > validate_mem_section() > > if (mem_map == 0) { > > mem_map = NOT_MEMMAP_ADDR; > > } > > > > I reached the same conclusion as Baoquan's. > > > > > > > > > The original code assumes that there are only two kinds of mem_section, > > > one is empty value, the second is a present one. This removing behaves > > > the same as the old code, I don't see a problem with it. > > > > > > I tried to understand the code, but I don't know why it need call > > > validate_mem_section() twice and compare the returned value. > > I think it could be if/else, no need to call twice. > > It looks to me that commit 14876c45aef ("[PATCH makedumpfile] handle mem_section > as either a pointer or an array") was intended to support kernels which had > 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") > and did not have > a0b1280368d1 ("kdump: write correct address of mem_section into vmcoreinfo"). > > Apparently there were some released Ubuntu kernels like this. > > So, if we need that logic, your patch seems > - support kernels which section_mem_map is non-NULL for hot-removed memory, > - but might increase the possiblity of false-positive. > > I think this is a tradeoff between the above two, but the latter would be > small enough. And I'm testing the patch and OK with 10 vmcores so far. > > > Cc Thadeu Lima de Souza Cascardo, who contributes the original code, and > > may have some idea about it. > > So if Cascardo doesn't have any objection, I will accept. Sorry for taking too long to respond, as I was on vacation. The kernels that had commit 83e3c48729d9, but not commit a0b1280368d1, are not supported anymore. In a way that it's even hard for me to test them. However, I managed to test it, and those two lines are definitively needed to dump a VM running such a kernel. Is removing them really needed to fix this issue? Otherwise, I would rather keep them. Thanks. Cascardo. > > Thanks, > Kazu > > P.S. My email address has been changed to k-hagio-ab@xxxxxxx. > Please send email to this address in the future. Thanks. _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec