Re: Are all linux_banner type cases covered in verify_version()?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023/08/29 0:23, David Mair wrote:
> On 8/20/23 19:37, HAGIO KAZUHITO(萩尾 一仁) wrote:
>> On 2023/08/18 3:08, David Mair wrote:
>>> Hi All,
>>>
>>> Before I consider starting work on a patch for this I'd appreciate more
>>> input.
>>>
>>> I am seeing random cases of crash failing to load reporting an x86_64
>>> coredump reporting a "bad" linux_banner. However, the value displayed as
>>> the banner is:
>>>
>>> 0x65762078756e694c
>>>
>>> which is plainly ASCII text as a 64-bit number and is the little-endian
>>> reversal of the text "Linux ver".
>>>
>>> It's randomly found with specific coredumps and reproduces all times
>>> with that coredump and a given version of crash, though sometimes it
>>> will appear when using a given coredump with one version of crash but
>>> not with another version of crash. What I'm trying to get working is
>>> crash current and the rest of this is experience using crash 8.0.3 only.
>>>
>>> I used gdb crash to debug it through verify_version(). If I breakpoint
>>> there with gdb crash and step through the function I find that in the
>>> section:
>>>
>>>
>>>       if (!(sp = symbol_search("linux_banner")))
>>>           error(FATAL, "linux_banner symbol does not exist?\n");
>>>       else if ((sp->type == 'R') || (sp->type == 'r') ||
>>>           (THIS_KERNEL_VERSION >= LINUX(2,6,11) && (sp->type == 'D' ||
>>> sp->type == 'd')) ||
>>>            (machine_type("ARM") && sp->type == 'T') ||
>>>            (machine_type("ARM64")))
>>>           linux_banner = symbol_value("linux_banner");
>>>       else
>>>           get_symbol_data("linux_banner", sizeof(ulong), &linux_banner);
>>>
>>>
>>> * The if block is not executed, i.e. symbol_search("linux_banner")
>>> succeeded and we have a usable struct syment for "linux_banner" in sp
>>> * The else if block is not executed, all conditions are met or not
>>> relevant except for the the value of sp->type in the case of
>>> THIS_KERNEL_VERSION >= LINUX(2,6,11). But sp->type is 'B', bss segment
>>> * The final else block is executed, we copy sizeof(ulong) bytes of
>>> symbol data from what "linux_banner" refers to into the crash internal
>>> linux_banner variable
>>>
>>> Here's how sp looks at the else if statement in the above code:
>>>
>>> gdb) print *sp
>>> $2 = {value = 18446744071587233984, name = 0x5555566a735b 
>>> "linux_banner",
>>>     val_hash_next = 0x7fffe51e4338, name_hash_next = 0x7fffe51f8d38,
>>>     type = 66 'B', cnt = 1 '\001', flags = 0 '\000', pad2 = 0 '\000'}
>>>
>>> ...and sp->value in hex is:
>>>
>>> (gdb) p/x sp->value
>>> $5 = 0xffffffff818000c0
>>>
>>> Starting crash in --minimal mode with the same core, kernel and
>>> debuginfo so that I can try to read 0xffffffff818000c0 I find:
>>>
>>> crash> rd 0xffffffff818000c0 18
>>> ffffffff818000c0:  65762078756e694c 2e34206e6f697372   Linux version 4.
>>> ffffffff818000d0:  34392d3038312e34 6665642d3533312e   4.180-94.135-def
>>> ffffffff818000e0:  65672820746c7561 6c697562406f6b65   ault (geeko@buil
>>> ffffffff818000f0:  28202974736f6864 7372657620636367   dhost) (gcc vers
>>> ffffffff81800100:  2e382e34206e6f69 2045535553282035   ion 4.8.5 (SUSE
>>> ffffffff81800110:  29202978756e694c 20504d5320312320   Linux) ) #1 SMP
>>> ffffffff81800120:  20766f4e206e6f4d 35353a3930203631   Mon Nov 16 09:55
>>> ffffffff81800130:  204354552037353a 3033282030323032   :57 UTC 2020 (30
>>> ffffffff81800140:  000a293039393633 0000000000000000   36990)..........
>>>
>>> IOW, it is the linux_banner, it's at 0xffffffff818000c0 and if the first
>>> sizeof(ulong) bytes are read into crash's linux_banner variable via
>>> get_symbol_data() it makes a 64-bit number with little-endian revesral
>>> from the string bytes "Linux ver" from the actual linux_banner text, the
>>> same value seen in the error report when crash fails.
>>>
>>> If I repeat the above debug of verify_version() in gdb and at the else
>>> if block in the above C in verify_version() I set var the sp->type to be
>>> 'D' or 'd' then crash's linux_banner is set to 0xffffffff818000c0
>>> (sp->value) and the else if block is executed then the remainder of the
>>> function successfully finds a linux_banner and gets the version
>>> information from it and crash loads.
>>>
>>> In symbol_search() is the value of type in the returned struct syment a
>>> property generated by crash (I thought not and it was based on the
>>> kernel compilation, possible kdummp/makedumpfile but not controlled by
>>> crash)? If I'm correct then is it safe to expect a struct syment with
>>> type 'B' (also 'b' I believe) for linux_banner? Or is there anything
>>> special about a bss segment type that makes it not possible to assume we
>>> can take sp->value as-is for the address of the linux_banner string
>>> details, i.e. we can't safely use symbol_value("linux_banner") to set
>>> the crash linux_banner variable if the struct syment type for
>>> linux_banner is 'B' or 'b'?
>>>
>>> Any comments? I'll write a patch if that's the right direction but I
>>> need a better understanding of why symbols with the type bss segment
>>> aren't already assumed valid sources of a linux_banner address value,
>>> only 'D' and 'd' types are and if there is anything special I don't
>>> understand about 'B' and 'b' types.
>>>
>>
>> It seems that sp->type is set to a value from bfd_get_symbol_info() in
>> store_symbols().  So it's from the vmlinux through the embedded gdb.
>>
>> Anyway, I don't think that the current check gets the point, i.e. we can
>> check the type of the linux_banner symbol directly like this?
>>
>>     switch (get_symbol_type("linux_banner", NULL, NULL))
>>     {
>>         case TYPE_CODE_ARRAY:
>>             linux_banner = sp->value;
>>             break;
>>         case TYPE_CODE_PTR:
>>             get_symbol_data("linux_banner", sizeof(ulong), 
>> &linux_banner);
>>             break;
>>         default:
>>             error(WARNING, "linux_banner is unknown type\n");
>>             linux_banner = 0;
>>             break;
>>     }
>>
>> I don't know why it checks the sp->type, but what it should do
>> originally might be the above, I think.
>>
>> Lianbo, do you have an old vmcore that has "const char *linux_banner"?
>> According to crash commit fce91bec5bef, 2.6.10 and older kernels have
>> it.  I'd like to test the above with it.
> 
> Kazu,
> 
> I've seen the above code solve cases of starting crash giving "bad" 
> linux_banner for three coredumps in a week. I don't want to be  a reason 
> to hold things back and the variety of coredumps is (well, it is only 
> three) but random enough for me to expect the change covers a broad set 
> of dumps.
> 
> It's your change, do you want to make the commit? I can prepare one if 
> you prefer.
> 

If you make a patch, it's helpful to me.

One thing, in the default case, "linux_banner = sp->value" might be 
better than setting to 0 for the likelihood.  There was no need to throw 
the symbol value away, let's try it.

Thanks,
Kazu
--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/crash-utility
Contribution Guidelines: https://github.com/crash-utility/crash/wiki




[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux