> -----Original Message----- > On Fri, Sep 27, 2019 at 08:39:04PM +0000, Kazuhito Hagio wrote: > > > -----Original Message----- > > > On Thu, Sep 26, 2019 at 06:41:48PM +0000, Kazuhito Hagio wrote: > > > > > > > > -----Original Message----- > > > > > If info->max_mapnr and pfn_memhole are equal, we divide by zero when > > > > > trying determine the 'shrinking' value. > > > > > > > > > > On the system I saw this error, we arrived at this function with > > > > > info->max_mapnr:0x0000000001080000 pfn_memhole:0x0000000001080000 > > > > > > > > Thank you for the patch. > > > > I suppose that you see the error with the -E option, right? > > > > > > > > It seems that the -E option has some problems with its statistics, > > > > so I'm checking whether there is a better way to fix this. > > > > > > Yes, we use the -E option. > > > We manage to get useful info from the generated dump after this fix, so > > > it seems it only affects the statistics output. > > > > OK, the statistics in cyclic mode with the -E option is completely wrong > > but a possible fix is likely to affect the whole of cyclic processing, so > > I just cover the hole with your patch and leave the statistics problem as > > a known issue at this time. I would revisit it when I have time. > > > > The patch was applied to the devel branch. > > While this patch does avoid the divide by zero, some further analysis > shows that there seems to be some deeper problem when we encounter this > 'original pages = 0' situation. > > Take a look at the attached output from makedumpfile. > > Key part in the summary: > > [ 518.819690] Original pages : 0x0000000000000000 > [ 518.828894] Excluded pages : 0x0000000003decd15 > [ 518.838635] Pages filled with zero : 0x00000000000210ee > [ 518.849920] Non-private cache pages : 0x000000000000271a > [ 518.861218] Private cache pages : 0x000000000000da47 > [ 518.872502] User process data pages : 0x0000000003d6bdc8 > [ 518.883786] Free pages : 0x000000000004fcfe > [ 518.895070] Hwpoison pages : 0x0000000000000000 > [ 518.906356] Offline pages : 0x0000000000000000 > [ 518.917659] Remaining pages : 0xfffffffffc2132eb > [ 518.927398] Memory Hole : 0x0000000004080000 > > In this case, 'remaining pages' has gone negative which looks concerning. This is the known issue that I wrote above and am looking for a safe fix. How does this patch work? --- a/makedumpfile.c +++ b/makedumpfile.c @@ -56,6 +56,9 @@ static void first_cycle(mdf_pfn_t start, mdf_pfn_t max, struct cycle *cycle) if (cycle->end_pfn > max) cycle->end_pfn = max; + if (cycle->start_pfn < start) + cycle->start_pfn = start; + cycle->exclude_pfn_start = 0; cycle->exclude_pfn_end = 0; } @@ -7595,6 +7598,9 @@ write_elf_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page) } for (pfn = MAX(pfn_start, cycle.start_pfn); pfn < cycle.end_pfn; pfn++) { + if (info->flag_cyclic) + pfn_memhole--; + if (!is_dumpable(info->bitmap2, pfn, &cycle)) { num_excluded++; if ((pfn == pfn_end - 1) && frac_tail) If it looks good, I'll look into its side effects further, but might take some time.. > > And the crashdump seems corrupt: > > 'crash' complains: > WARNING: possibly corrupt Elf64_Nhdr: n_namesz: 2079035392 n_descsz: 3 n_type: 1000 > > vmcore-dmesg complains "Missing the log_buf symbol", even though the makedumpfile log > shows it was present at ffffffff822510a0 > > Readelf seems to think the notes sections are mangled. > > # readelf -n vmcore > > Displaying notes found at file offset 0x00015468 with length 0x0000556c: > Owner Data size Description > 0x00000007 Unknown note type: (0x727c79d4) > readelf: vmcore: Warning: Corrupt note: name size is too big: 7beb9000 > (NONE) 0x00000003 Unknown note type: (0x00001000) > readelf: vmcore: Warning: Corrupt note: name size is too big: 55a000 > (NONE) 0x00000000 Unknown note type: (0x00000000) > (NONE) 0x00000001 Unknown note type: (0x00000007) > readelf: vmcore: Warning: note with invalid namesz and/or descsz found at offset 0x44 > readelf: vmcore: Warning: type: 0xffff8803, namesize: 0x00000000, descsize: 0x7c413000 I don't think that the statistics issue corrupts a dumpfile itself so far. Could you show me the output of "readelf -a vmcore"? Does this issue always reproduce? Thanks, Kazu > > > > Any thoughts on where to add additional debugging in makedumpfile ? > > Dave _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec