On 2023/08/29 0:23, David Mair wrote: > On 8/20/23 19:37, HAGIO KAZUHITO(萩尾 一仁) wrote: >> On 2023/08/18 3:08, David Mair wrote: >>> Hi All, >>> >>> Before I consider starting work on a patch for this I'd appreciate more >>> input. >>> >>> I am seeing random cases of crash failing to load reporting an x86_64 >>> coredump reporting a "bad" linux_banner. However, the value displayed as >>> the banner is: >>> >>> 0x65762078756e694c >>> >>> which is plainly ASCII text as a 64-bit number and is the little-endian >>> reversal of the text "Linux ver". >>> >>> It's randomly found with specific coredumps and reproduces all times >>> with that coredump and a given version of crash, though sometimes it >>> will appear when using a given coredump with one version of crash but >>> not with another version of crash. What I'm trying to get working is >>> crash current and the rest of this is experience using crash 8.0.3 only. >>> >>> I used gdb crash to debug it through verify_version(). If I breakpoint >>> there with gdb crash and step through the function I find that in the >>> section: >>> >>> >>> if (!(sp = symbol_search("linux_banner"))) >>> error(FATAL, "linux_banner symbol does not exist?\n"); >>> else if ((sp->type == 'R') || (sp->type == 'r') || >>> (THIS_KERNEL_VERSION >= LINUX(2,6,11) && (sp->type == 'D' || >>> sp->type == 'd')) || >>> (machine_type("ARM") && sp->type == 'T') || >>> (machine_type("ARM64"))) >>> linux_banner = symbol_value("linux_banner"); >>> else >>> get_symbol_data("linux_banner", sizeof(ulong), &linux_banner); >>> >>> >>> * The if block is not executed, i.e. symbol_search("linux_banner") >>> succeeded and we have a usable struct syment for "linux_banner" in sp >>> * The else if block is not executed, all conditions are met or not >>> relevant except for the the value of sp->type in the case of >>> THIS_KERNEL_VERSION >= LINUX(2,6,11). But sp->type is 'B', bss segment >>> * The final else block is executed, we copy sizeof(ulong) bytes of >>> symbol data from what "linux_banner" refers to into the crash internal >>> linux_banner variable >>> >>> Here's how sp looks at the else if statement in the above code: >>> >>> gdb) print *sp >>> $2 = {value = 18446744071587233984, name = 0x5555566a735b >>> "linux_banner", >>> val_hash_next = 0x7fffe51e4338, name_hash_next = 0x7fffe51f8d38, >>> type = 66 'B', cnt = 1 '\001', flags = 0 '\000', pad2 = 0 '\000'} >>> >>> ...and sp->value in hex is: >>> >>> (gdb) p/x sp->value >>> $5 = 0xffffffff818000c0 >>> >>> Starting crash in --minimal mode with the same core, kernel and >>> debuginfo so that I can try to read 0xffffffff818000c0 I find: >>> >>> crash> rd 0xffffffff818000c0 18 >>> ffffffff818000c0: 65762078756e694c 2e34206e6f697372 Linux version 4. >>> ffffffff818000d0: 34392d3038312e34 6665642d3533312e 4.180-94.135-def >>> ffffffff818000e0: 65672820746c7561 6c697562406f6b65 ault (geeko@buil >>> ffffffff818000f0: 28202974736f6864 7372657620636367 dhost) (gcc vers >>> ffffffff81800100: 2e382e34206e6f69 2045535553282035 ion 4.8.5 (SUSE >>> ffffffff81800110: 29202978756e694c 20504d5320312320 Linux) ) #1 SMP >>> ffffffff81800120: 20766f4e206e6f4d 35353a3930203631 Mon Nov 16 09:55 >>> ffffffff81800130: 204354552037353a 3033282030323032 :57 UTC 2020 (30 >>> ffffffff81800140: 000a293039393633 0000000000000000 36990).......... >>> >>> IOW, it is the linux_banner, it's at 0xffffffff818000c0 and if the first >>> sizeof(ulong) bytes are read into crash's linux_banner variable via >>> get_symbol_data() it makes a 64-bit number with little-endian revesral >>> from the string bytes "Linux ver" from the actual linux_banner text, the >>> same value seen in the error report when crash fails. >>> >>> If I repeat the above debug of verify_version() in gdb and at the else >>> if block in the above C in verify_version() I set var the sp->type to be >>> 'D' or 'd' then crash's linux_banner is set to 0xffffffff818000c0 >>> (sp->value) and the else if block is executed then the remainder of the >>> function successfully finds a linux_banner and gets the version >>> information from it and crash loads. >>> >>> In symbol_search() is the value of type in the returned struct syment a >>> property generated by crash (I thought not and it was based on the >>> kernel compilation, possible kdummp/makedumpfile but not controlled by >>> crash)? If I'm correct then is it safe to expect a struct syment with >>> type 'B' (also 'b' I believe) for linux_banner? Or is there anything >>> special about a bss segment type that makes it not possible to assume we >>> can take sp->value as-is for the address of the linux_banner string >>> details, i.e. we can't safely use symbol_value("linux_banner") to set >>> the crash linux_banner variable if the struct syment type for >>> linux_banner is 'B' or 'b'? >>> >>> Any comments? I'll write a patch if that's the right direction but I >>> need a better understanding of why symbols with the type bss segment >>> aren't already assumed valid sources of a linux_banner address value, >>> only 'D' and 'd' types are and if there is anything special I don't >>> understand about 'B' and 'b' types. >>> >> >> It seems that sp->type is set to a value from bfd_get_symbol_info() in >> store_symbols(). So it's from the vmlinux through the embedded gdb. >> >> Anyway, I don't think that the current check gets the point, i.e. we can >> check the type of the linux_banner symbol directly like this? >> >> switch (get_symbol_type("linux_banner", NULL, NULL)) >> { >> case TYPE_CODE_ARRAY: >> linux_banner = sp->value; >> break; >> case TYPE_CODE_PTR: >> get_symbol_data("linux_banner", sizeof(ulong), >> &linux_banner); >> break; >> default: >> error(WARNING, "linux_banner is unknown type\n"); >> linux_banner = 0; >> break; >> } >> >> I don't know why it checks the sp->type, but what it should do >> originally might be the above, I think. >> >> Lianbo, do you have an old vmcore that has "const char *linux_banner"? >> According to crash commit fce91bec5bef, 2.6.10 and older kernels have >> it. I'd like to test the above with it. > > Kazu, > > I've seen the above code solve cases of starting crash giving "bad" > linux_banner for three coredumps in a week. I don't want to be a reason > to hold things back and the variety of coredumps is (well, it is only > three) but random enough for me to expect the change covers a broad set > of dumps. > > It's your change, do you want to make the commit? I can prepare one if > you prefer. > If you make a patch, it's helpful to me. One thing, in the default case, "linux_banner = sp->value" might be better than setting to 0 for the likelihood. There was no need to throw the symbol value away, let's try it. Thanks, Kazu -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/crash-utility Contribution Guidelines: https://github.com/crash-utility/crash/wiki