On 2023/09/07 19:54, Daisuke Hatayama (Fujitsu) wrote: > Hagio-san, > > I'm now facing the issue on RHEL8 and RHEL9 that crash dump files are > corrupted although makedumpfile terminated in successful status, in > the issue which I found the issue this patch tries to fix. > > I made a program to check where is corrupted in detail in such crash > dump file. The following is example: > > # ./validatekdump -v vmcore > disk dump header: > signature: "KDUMP ^F" > header_version: 6 > block_size: 4096 > sub_hdr_size: 18 > bitmap_blocks: 1056 > total_ram_blocks: 0 > device_blocks: 0 > written_blocks: 0 > > kdump sub header: > phys_base: 0000000172200000 > dump_level: 31 > start_pfn_64: 0 > end_pfn_64: 0 > max_mapnr_64: 17301504 > Total ram pages: 16597666 > Total dumpable pages: 767664 > > PFN OFFSET SIZE FLAGS PAGE_FLAGS > 1024 22827136 4096 00000000 0000000000000000 valid > 1025 22827136 4096 00000000 0000000000000000 valid > ...strip... > 1102265 54518942 143 00000000 00000000033cd80d invalid > 1102266 60779170 42127 00000000 00000000033cd80d invalid > 1102267 61070498 38287 00000000 00000000033cd80d invalid > > In my evaluation, disk dump header, kdump sub header, two bitmaps > appears consistent. The corrupted part begins in the middle of page > descriptor table. Page descriptors are corrupted in various patterns, > for example: > > - offset is larger than size of the crash dump file. > - size is larger than 4096 bytes. > - size is 0. > - size is smaller than 4096 bytes but flags doesn't hold the value representing compression. > - flags doesn't consist of the values defined by DUMP_DH_XXXXX macros. > - page_flags is not 0 although it is unused. > > Corrupted part appears to reach data portion. Some crash dump file > contains valid 49489 page descriptors whose data portion is compressed > with LZO according to the flags member, but only 7 of them are > decompressed successfully. > > Unfortunately, information to investigate this issue is just these > crash dump files. I don't have machines where the issue is > reproducible. So, while I'm making effort to reproduce the issue, I'm > checking known similar issues on related components where the root > cause of this issue possibly lies in, such as makedumpfile, > filesystem, disk driver, or any hardware issue. > > I looked into git log but I didn't find such commits. > > Also, I have never seen issues resulting in this kind of data corruption on makedumpfile. > > Then, would you tell me if you know some past or current issues on > makedumpfile that are similar to the above situation? Hmm, I've tested makedumpfile every week with the latest upstream and RHEL8/9 kernels on various machines, but I've also never seen such corruption, IIRC. Hope you find a reproducer.. Thanks, Kazu > > Thanks. > HATAYAMA, Daisuke > >> ________________________________________ >> From: Crash-utility <crash-utility-bounces@xxxxxxxxxx> on behalf of HATAYAMA Daisuke <d.hatayama@xxxxxxxxxxx> >> Sent: Thursday, September 7, 2023 18:38 >> To: crash-utility@xxxxxxxxxx >> Subject: [PATCH] diskdump: fix read_diskdump() returns successfully for illegal 0-size page descriptors >> >> read_diskdump() returns successfully for illegal 0-size page >> descriptors. Page descriptors are illegal if their size member holds 0 >> because makedumpfile never puts 0 there because any data never result >> in 0 byte by compression. If page descriptors hold 0 in size member, >> it means the crash dump file is corrupted for some reason. >> >> The root cause of this is that sanity check of function cache_page() >> doesn't focus on such 0-size page descriptors. Then, the 0-size page >> descriptor is passed to pread(), pread() immediately returns 0 >> successfully because read data is 0 byte, and then read_diskdump() >> returns successfully. >> >> To fix this issue, let the sanity check take into account such 0-size >> page descriptors and read_diskdump() result in READ_ERROR. >> >> Signed-off-by: HATAYAMA Daisuke <d.hatayama@xxxxxxxxxxx> >> --- >> diskdump.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/diskdump.c b/diskdump.c >> index 2c284ff..2be7cc7 100644 >> --- a/diskdump.c >> +++ b/diskdump.c >> @@ -1210,7 +1210,7 @@ cache_page(physaddr_t paddr) >> return ret; >> >> /* sanity check */ >> - if (pd.size > block_size) >> + if (pd.size > block_size || !pd.size) >> return READ_ERROR; >> >> /* read page data */ >> -- >> 2.25.1 >> >> -- >> Crash-utility mailing list >> Crash-utility@xxxxxxxxxx >> https://listman.redhat.com/mailman/listinfo/crash-utility >> Contribution Guidelines: https://github.com/crash-utility/crash/wiki -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/crash-utility Contribution Guidelines: https://github.com/crash-utility/crash/wiki