kdump issues with 4.11 kernel

Anil.Gurumurthy@xxxxxxxxxx (Gurumurthy, Anil) · Thu, 30 Nov 2017 12:29:16 +0000

Hi Bhupesh,
  I tried to get some log messages for the crash kernel, but unable to get anything.
echo c > /proc/sysrq-trigger
simply hangs w/o any messages on the console.

Thanks,
Anil
-----Original Message-----
From: Gurumurthy, Anil 
Sent: 29 November 2017 16:02
To: 'Bhupesh Sharma' <bhsharma at redhat.com>
Cc: Dave Young <dyoung at redhat.com>; kexec at lists.infradead.org
Subject: RE: kdump issues with 4.11 kernel

-----Original Message-----
From: Bhupesh Sharma [mailto:bhsharma@xxxxxxxxxx]
Sent: 29 November 2017 15:50
To: Gurumurthy, Anil <Anil.Gurumurthy at cavium.com>
Cc: Dave Young <dyoung at redhat.com>; kexec at lists.infradead.org
Subject: Re: kdump issues with 4.11 kernel

On Wed, Nov 29, 2017 at 3:36 PM, Gurumurthy, Anil <Anil.Gurumurthy at cavium.com> wrote:
>
>
> -----Original Message-----
> From: Bhupesh Sharma [mailto:bhsharma at redhat.com]
> Sent: 29 November 2017 15:16
> To: Gurumurthy, Anil <Anil.Gurumurthy at cavium.com>
> Cc: Dave Young <dyoung at redhat.com>; kexec at lists.infradead.org
> Subject: Re: kdump issues with 4.11 kernel
>
> Hi Anil,
>
> On Wed, Nov 29, 2017 at 2:44 PM, Gurumurthy, Anil <Anil.Gurumurthy at cavium.com> wrote:
>> Thanks. That did help getting kexec to work.
>> However I still do not get a crash dump - echo c  > 
>> /proc/sysrq-trigger does not get a crash dump.
>>
>> Any thoughts?
>
> Cam you share the console messages you see when the crash kernel 
> boots? Or do you see nothing after the crash is introduced via echo c
>> /proc/sysrq-trigger
> [Anil]  I do not see any messages after introducing the crash.

There could be several reasons for this:

- crashkernel might be missing some arch/machine specific options.
- It may be that the purgatory sha verification has failed. If your arch supports a console in purgatory then it is easy to debug this.
- It might be that the crash kernel itself crashed very early. Pass some earlycon/earlyprintk option for your system to the second kernel command line.
- Also please share relevant dmesg log of both primary kernel boot and the commands you use to invoke the crashkernel.

[Anil] Thanks for the quick response

   This is what I have in the .config (for the primary kernel) CONFIG_EARLY_PRINTK=y CONFIG_EARLY_PRINTK_DBGP=y CONFIG_EARLY_PRINTK_EFI=y

The log for the primary kernel boot:

Nov 29 14:22:22 localhost journal: Runtime journal is using 8.0M (max 1.9G, leaving 2.9G of free 19.4G, current limit 1.9G).
Nov 29 14:22:22 localhost kernel: Linux version 4.11.12+ (root at localhost.localdomain) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #4 SMP Thu Nov 23 12:11:02 IST 2017 Nov 29 14:22:22 localhost kernel: Command line: BOOT_IMAGE=/vmlinuz-4.11.12+ root=/dev/mapper/rhel-root ro rd.lvm.lv=rhel/swap crashkernel=128M rd.lvm.lv=rhel/root rhgb quiet Nov 29 14:22:22 localhost kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Nov 29 14:22:22 localhost kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Nov 29 14:22:22 localhost kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Nov 29 14:22:22 localhost kernel: x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256 Nov 29 14:22:22 localhost kernel: x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
Nov 29 14:22:22 localhost kernel: e820: BIOS-provided physical RAM map:
Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000006bfff] usable Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000000006c000-0x000000000006cfff] ACPI NVS Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000000006d000-0x000000000009efff] usable Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000000009f000-0x000000000009ffff] ACPI NVS Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x0000000000100000-0x000000005d184fff] usable Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000005d185000-0x000000005d185fff] ACPI data Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000005d186000-0x000000005fb77fff] usable Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000005fb78000-0x000000005fb7cfff] reserved Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000005fb7d000-0x000000005ffdffff] usable Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000005ffe0000-0x000000005ffe1fff] ACPI data Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000005ffe2000-0x000000005fffafff] usable Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000005fffb000-0x0000000060001fff] ACPI data Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x0000000060002000-0x0000000060009fff] usable Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000006000a000-0x000000006000efff] ACPI data Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x000000006000f000-0x000000006000ffff] usable Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x0000000060010000-0x0000000060011fff] ACPI data Nov 29 14:22:22 localhost kernel: BIOS-e820: [mem 0x0000000060012000-0x00000000600ddfff] usable

Will try to get the other details you needed too.

-Anil

Regards,
Bhupesh

> Generally, depending on your test machine arch, it is useful to use earlycon/earlyprintk to see if the crash kernel produced any useful message until the actual console device became operational.
>
> Can you try setting the earlycon/earlyprintk settings and share the crash kernel logs messages after the same?
>
> Thanks,
> Bhupesh
>
>> -----Original Message-----
>> From: Dave Young [mailto:dyoung at redhat.com]
>> Sent: 29 November 2017 13:09
>> To: Gurumurthy, Anil <Anil.Gurumurthy at cavium.com>
>> Cc: kexec at lists.infradead.org
>> Subject: Re: kdump issues with 4.11 kernel
>>
>> Hi,
>> On 11/29/17 at 05:29am, Gurumurthy, Anil wrote:
>>> Hello,
>>>   I was facing trouble getting a crash dump on 4.11 kernel. Debugging a bit, I see that the kexec run from the cmd line fails. Any ideas on what I could be missing?
>>>
>>> [root at localhost ~]# kexec -p /boot/vmlinuz-`uname -r` 
>>> --initrd=/boot/initramfs-`uname -r`kdump.img ELF core (kcore) parse 
>>> failed Cannot load /boot/vmlinuz-4.11.12+
>>>
>>
>> Can you try below kexec-tools commit:
>> commit ed15ba1b9977e506637ff1697821d97127b2c919
>> Author: Pratyush Anand <panand at redhat.com>
>> Date:   Wed Mar 1 11:19:42 2017 +0530
>>
>>     build_mem_phdrs(): check if p_paddr is invalid
>>
>> Thanks
>> Dave
>>
>> _______________________________________________
>> kexec mailing list
>> kexec at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/kexec