Re: [PATCH] diskdump: fix read_diskdump() returns successfully for illegal 0-size page descriptors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023/09/07 19:54, Daisuke Hatayama (Fujitsu) wrote:
> Hagio-san,
> 
> I'm now facing the issue on RHEL8 and RHEL9 that crash dump files are
> corrupted although makedumpfile terminated in successful status, in
> the issue which I found the issue this patch tries to fix.
> 
> I made a program to check where is corrupted in detail in such crash
> dump file. The following is example:
> 
> 	# ./validatekdump -v vmcore
> 	disk dump header:
> 	  signature: "KDUMP   ^F"
> 	  header_version: 6
> 	  block_size: 4096
> 	  sub_hdr_size: 18
> 	  bitmap_blocks: 1056
> 	  total_ram_blocks: 0
> 	  device_blocks: 0
> 	  written_blocks: 0
> 
> 	kdump sub header:
> 	  phys_base: 0000000172200000
> 	  dump_level: 31
> 	  start_pfn_64: 0
> 	  end_pfn_64: 0
> 	  max_mapnr_64: 17301504
> 	Total ram pages: 16597666
> 	Total dumpable pages: 767664
> 
> 	PFN OFFSET SIZE FLAGS PAGE_FLAGS
> 	1024 22827136 4096 00000000 0000000000000000 valid
> 	1025 22827136 4096 00000000 0000000000000000 valid
> 	...strip...
> 	1102265 54518942 143 00000000 00000000033cd80d invalid
> 	1102266 60779170 42127 00000000 00000000033cd80d invalid
> 	1102267 61070498 38287 00000000 00000000033cd80d invalid
> 
> In my evaluation, disk dump header, kdump sub header, two bitmaps
> appears consistent. The corrupted part begins in the middle of page
> descriptor table. Page descriptors are corrupted in various patterns,
> for example:
> 
> - offset is larger than size of the crash dump file.
> - size is larger than 4096 bytes.
> - size is 0.
> - size is smaller than 4096 bytes but flags doesn't hold the value representing compression.
> - flags doesn't consist of the values defined by DUMP_DH_XXXXX macros.
> - page_flags is not 0 although it is unused.
> 
> Corrupted part appears to reach data portion. Some crash dump file
> contains valid 49489 page descriptors whose data portion is compressed
> with LZO according to the flags member, but only 7 of them are
> decompressed successfully.
> 
> Unfortunately, information to investigate this issue is just these
> crash dump files. I don't have machines where the issue is
> reproducible. So, while I'm making effort to reproduce the issue, I'm
> checking known similar issues on related components where the root
> cause of this issue possibly lies in, such as makedumpfile,
> filesystem, disk driver, or any hardware issue.
> 
> I looked into git log but I didn't find such commits.
> 
> Also, I have never seen issues resulting in this kind of data corruption on makedumpfile.
> 
> Then, would you tell me if you know some past or current issues on
> makedumpfile that are similar to the above situation?

Hmm, I've tested makedumpfile every week with the latest upstream and 
RHEL8/9 kernels on various machines, but I've also never seen such 
corruption, IIRC.

Hope you find a reproducer..

Thanks,
Kazu

> 
> Thanks.
> HATAYAMA, Daisuke
> 
>> ________________________________________
>> From: Crash-utility <crash-utility-bounces@xxxxxxxxxx> on behalf of HATAYAMA Daisuke <d.hatayama@xxxxxxxxxxx>
>> Sent: Thursday, September 7, 2023 18:38
>> To: crash-utility@xxxxxxxxxx
>> Subject:  [PATCH] diskdump: fix read_diskdump() returns successfully for illegal 0-size page descriptors
>>
>> read_diskdump() returns successfully for illegal 0-size page
>> descriptors. Page descriptors are illegal if their size member holds 0
>> because makedumpfile never puts 0 there because any data never result
>> in 0 byte by compression. If page descriptors hold 0 in size member,
>> it means the crash dump file is corrupted for some reason.
>>
>> The root cause of this is that sanity check of function cache_page()
>> doesn't focus on such 0-size page descriptors. Then, the 0-size page
>> descriptor is passed to pread(), pread() immediately returns 0
>> successfully because read data is 0 byte, and then read_diskdump()
>> returns successfully.
>>
>> To fix this issue, let the sanity check take into account such 0-size
>> page descriptors and read_diskdump() result in READ_ERROR.
>>
>> Signed-off-by: HATAYAMA Daisuke <d.hatayama@xxxxxxxxxxx>
>> ---
>>   diskdump.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/diskdump.c b/diskdump.c
>> index 2c284ff..2be7cc7 100644
>> --- a/diskdump.c
>> +++ b/diskdump.c
>> @@ -1210,7 +1210,7 @@ cache_page(physaddr_t paddr)
>>                  return ret;
>>
>>          /* sanity check */
>> -       if (pd.size > block_size)
>> +       if (pd.size > block_size || !pd.size)
>>                  return READ_ERROR;
>>
>>          /* read page data */
>> --
>> 2.25.1
>>
>> --
>> Crash-utility mailing list
>> Crash-utility@xxxxxxxxxx
>> https://listman.redhat.com/mailman/listinfo/crash-utility
>> Contribution Guidelines: https://github.com/crash-utility/crash/wiki
--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/crash-utility
Contribution Guidelines: https://github.com/crash-utility/crash/wiki




[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux