kdump issues with 4.11 kernel

bhsharma@xxxxxxxxxx (Bhupesh Sharma) · Tue, 5 Dec 2017 01:29:50 +0530

Hello Anil,

On Thu, Nov 30, 2017 at 5:59 PM, Gurumurthy, Anil
<Anil.Gurumurthy at cavium.com> wrote:
> Hi Bhupesh,
>   I tried to get some log messages for the crash kernel, but unable to get anything.
> echo c > /proc/sysrq-trigger
> simply hangs w/o any messages on the console.

Did you tree to set earlycon or earlyprintk in the bootargs. Something
like this:

earlycon=pl011,mmio32,0xff78ed1000

depending on the underlying uart device you have on the board. For
e.g. here I assumed a pl011 uart is used to display console messages.

Regards,
Bhupesh

>
> Thanks,
> Anil
> -----Original Message-----
> From: Gurumurthy, Anil
> Sent: 29 November 2017 16:02
> To: 'Bhupesh Sharma' <bhsharma at redhat.com>
> Cc: Dave Young <dyoung at redhat.com>; kexec at lists.infradead.org
> Subject: RE: kdump issues with 4.11 kernel
>
>
>
> -----Original Message-----
> From: Bhupesh Sharma [mailto:bhsharma at redhat.com]
> Sent: 29 November 2017 15:50
> To: Gurumurthy, Anil <Anil.Gurumurthy at cavium.com>
> Cc: Dave Young <dyoung at redhat.com>; kexec at lists.infradead.org
> Subject: Re: kdump issues with 4.11 kernel
>
> On Wed, Nov 29, 2017 at 3:36 PM, Gurumurthy, Anil <Anil.Gurumurthy at cavium.com> wrote:
>>
>>
>> -----Original Message-----
>> From: Bhupesh Sharma [mailto:bhsharma at redhat.com]
>> Sent: 29 November 2017 15:16
>> To: Gurumurthy, Anil <Anil.Gurumurthy at cavium.com>
>> Cc: Dave Young <dyoung at redhat.com>; kexec at lists.infradead.org
>> Subject: Re: kdump issues with 4.11 kernel
>>
>> Hi Anil,
>>
>> On Wed, Nov 29, 2017 at 2:44 PM, Gurumurthy, Anil <Anil.Gurumurthy at cavium.com> wrote:
>>> Thanks. That did help getting kexec to work.
>>> However I still do not get a crash dump - echo c  >
>>> /proc/sysrq-trigger does not get a crash dump.
>>>
>>> Any thoughts?
>>
>> Cam you share the console messages you see when the crash kernel
>> boots? Or do you see nothing after the crash is introduced via echo c
>>> /proc/sysrq-trigger
>> [Anil]  I do not see any messages after introducing the crash.
>
> There could be several reasons for this:
>
> - crashkernel might be missing some arch/machine specific options.
> - It may be that the purgatory sha verification has failed. If your arch supports a console in purgatory then it is easy to debug this.
> - It might be that the crash kernel itself crashed very early. Pass some earlycon/earlyprintk option for your system to the second kernel command line.
> - Also please share relevant dmesg log of both primary kernel boot and the commands you use to invoke the crashkernel.
>
> [Anil] Thanks for the quick response
>
>    This is what I have in the .config (for the primary kernel) CONFIG_EARLY_PRINTK=y CONFIG_EARLY_PRINTK_DBGP=y CONFIG_EARLY_PRINTK_EFI=y
>
> The log for the primary kernel boot:
>
> Nov 29 14:22:22 localhost journal: Runtime journal is using 8.0M (max 1.9G, leaving 2.9G of free 19.4G, current limit 1.9G).
> Nov 29 14:22:22 localhost kernel: Linux version 4.11.12+ (root at localhost.localdomain) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #4 SMP Thu Nov 23 12:11:02 IST 2017 Nov 29 14:22:22 localhost kernel: Command line: BOOT_IMAGE=/vmlinuz-4.11.12+ root=/dev/mapper/rhel-root ro rd.lvm.lv=rhel/swap crashkernel=128M rd.lvm.lv=rhel/root rhgb quiet Nov 29 14:22:22 localhost kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
> Nov 29 14:22:22 localhost kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
> Nov 29 14:22:22 localhost kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
> Nov 29 14:22:22 localhost kernel: x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256 Nov 29 14:22:22 localhost kernel: x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
> Nov 29 14:22:22 localhost kernel: e820: BIOS-provided physical RAM map:
> Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000006bfff] usable Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000000006c000-0x000000000006cfff] ACPI NVS Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000000006d000-0x000000000009efff] usable Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000000009f000-0x000000000009ffff] ACPI NVS Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x0000000000100000-0x000000005d184fff] usable Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000005d185000-0x000000005d185fff] ACPI data Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000005d186000-0x000000005fb77fff] usable Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000005fb78000-0x000000005fb7cfff] reserved Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000005fb7d000-0x000000005ffdffff] usable Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000005ffe0000-0x000000005ffe1fff] ACPI data Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000005ffe2000-0x000000005fffafff] usable Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000005fffb000-0x0000000060001fff] ACPI data Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x0000000060002000-0x0000000060009fff] usable Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000006000a000-0x000000006000efff] ACPI data Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000006000f000-0x000000006000ffff] usable Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x0000000060010000-0x0000000060011fff] ACPI data Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x0000000060012000-0x00000000600ddfff] usable
>
>
> Will try to get the other details you needed too.
>
> -Anil
>
> Regards,
> Bhupesh
>
>
>> Generally, depending on your test machine arch, it is useful to use earlycon/earlyprintk to see if the crash kernel produced any useful message until the actual console device became operational.
>>
>> Can you try setting the earlycon/earlyprintk settings and share the crash kernel logs messages after the same?
>>
>> Thanks,
>> Bhupesh
>>
>>> -----Original Message-----
>>> From: Dave Young [mailto:dyoung at redhat.com]
>>> Sent: 29 November 2017 13:09
>>> To: Gurumurthy, Anil <Anil.Gurumurthy at cavium.com>
>>> Cc: kexec at lists.infradead.org
>>> Subject: Re: kdump issues with 4.11 kernel
>>>
>>> Hi,
>>> On 11/29/17 at 05:29am, Gurumurthy, Anil wrote:
>>>> Hello,
>>>>   I was facing trouble getting a crash dump on 4.11 kernel. Debugging a bit, I see that the kexec run from the cmd line fails. Any ideas on what I could be missing?
>>>>
>>>> [root at localhost ~]# kexec -p /boot/vmlinuz-`uname -r`
>>>> --initrd=/boot/initramfs-`uname -r`kdump.img ELF core (kcore) parse
>>>> failed Cannot load /boot/vmlinuz-4.11.12+
>>>>
>>>
>>> Can you try below kexec-tools commit:
>>> commit ed15ba1b9977e506637ff1697821d97127b2c919
>>> Author: Pratyush Anand <panand at redhat.com>
>>> Date:   Wed Mar 1 11:19:42 2017 +0530
>>>
>>>     build_mem_phdrs(): check if p_paddr is invalid
>>>
>>> Thanks
>>> Dave
>>>
>>> _______________________________________________
>>> kexec mailing list
>>> kexec at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/kexec