Problem: crashkernel boots at 512MB address in RAM with kexec -l/-e but not with kexec -p

hans.heckel@xxxxxxxxxxxxxxxxxx (HECKEL, Hans (Hans)) · Wed, 18 Nov 2015 07:59:05 +0000

Hi Pratyush,
thanks a lot for looking into my issue.
How do I enable purgatory for the SHA-256 verification?
With respect to limiting the kernel RAM: it oopses if I give only the first
512MB to the production kernel and then 256MB to the crash kernel ("mem=512M
crashkernel=128M at 512M"). See details below. Seems like the crash kernel must
be *part* of the system RAM. Using "mem=768M crashkernel=128M at 512M" I have
the same behavior as in my original post.
Do the segments from the kexec debug output look correct to you? In the
kexec -l/-e case there are only 3 while in the -p case I have 4 segments.
Details are in my original post.
Any other idea as to what to investigate? How could I check for corruption
of the crash kernel?
Again, thanks a lot. Best regards, Hans

[...]
boot_prep_linux commandline=console=ttyS0,38400 earlyprintk=ttyS0
root=/dev/ram rdinit=/sbin/init rw mem=512M crashkernel=128M at 512M
[...]

Starting kernel ...

Uncompressing Linux... done, booting the kernel.
------------[ cut here ]------------
kernel BUG at
/vol/1830sdw/users/hheckel/maxbcm/linux-3.4-ng/mm/bootmem.c:351!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0    Not tainted  (3.4.91-wr5 #"maxbcm_V01.00.00")
PC is at mark_bootmem+0xd8/0xec
LR is at reserve_bootmem+0x2c/0x30
pc : [<c0737e08>]    lr : [<c0738290>]    psr: 600001d3
sp : c076ff18  ip : 00000001  fp : c076ff54
r10: c074d5a4  r9 : 00028000  r8 : c0752370
r7 : 00020000  r6 : 00020000  r5 : 00020000  r4 : c075235c
r3 : 00000000  r2 : 00000001  r1 : c0752370  r0 : c0752370
Flags: nZCv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment kernel
Control: 38c53c7d  Table: 00003000  DAC: fffffffd
Process swapper (pid: 0, stack limit = 0xc076e2f0)
Stack: (0xc076ff18 to 0xc0770000)
ff00:                                                       c076ff64
c076ff28
ff20: c0731f94 00000001 c076ff4c 20000000 c0785cb0 c0868900 c0785cb0
c074d5a4
ff40: 00000000 c074d5a4 c076ff64 c076ff58 c0738290 c0737d3c c076ffbc
c076ff68
ff60: c0729868 c0738270 c076ff80 c076ff88 20000000 00000000 ffffffff
ffffffff
ff80: 08000000 00000000 20000000 00000000 00000000 00000001 00000000
c074ecb8
ffa0: c0785be4 00007000 562f5842 00000000 c076fff4 c076ffc0 c0725620
c0728f88
ffc0: 00000000 00000000 00000000 00000000 00000000 c074ecbc 00000000
38c73c7d
ffe0: c0780308 c074ecb8 00000000 c076fff8 00008054 c0725598 00000000
00000000
[<c0737e08>] (mark_bootmem+0xd8/0xec) from [<c0738290>]
(reserve_bootmem+0x2c/0x30)
[<c0738290>] (reserve_bootmem+0x2c/0x30) from [<c0729868>]
(setup_arch+0x8ec/0x9e8)
[<c0729868>] (setup_arch+0x8ec/0x9e8) from [<c0725620>]
(start_kernel+0x94/0x3a4)
[<c0725620>] (start_kernel+0x94/0x3a4) from [<00008054>] (0x8054)
Code: e2404014 e2841014 e1580001 1affffd7 (e7f001f2)
---[ end trace 1b75b31a2719ed1c ]---
Kernel panic - not syncing: Fatal exception

-----Original Message-----
From: Pratyush Anand [mailto:panand@xxxxxxxxxx] 
Sent: Montag, 16. November 2015 06:35
To: HECKEL, Hans (Hans)
Cc: kexec at lists.infradead.org
Subject: Re: Problem: crashkernel boots at 512MB address in RAM with kexec
-l/-e but not with kexec -p

On 12/11/2015:01:33:18 PM, HECKEL, Hans (Hans) wrote:
> Dear kexec team,
> I hope it is okay to ask you as my public problem description has not 
> yielded any replies so far. My problem is posted here:
> http://unix.stackexchange.com/questions/237580/boot-rescue-kernel-at-h
> igh-me
> mory-address-using-kexec-on-arm
> and also copied below (without the formatting). Update: Same result 
> when using kernel 4.3 and kexec-tools 2.0.11.
> Any help is highly appreciated, and thanks for the work you are 
> putting into kexec!
> Best regards,
> Hans Heckel (Alcatel-Lucent, IP Routing and Transport)
> 
> 
> Summary: Crashkernel boots at 512MB address in RAM with kexec -l/-e 
> but not with kexec -p - why?
> 
> Embedded platform with Marvell Armada XP (MV78460) (ARMv7 with 4 
> cores) and 1GB of RAM.
> production kernel: customized Linux 3.4.91 rescue kernel: clean 
> kernel.org-Linux (4.2.3) (I am aware that it uses device trees but 
> that works fine by appending DTB to zImage) in user-space, I am using 
> the latest kexec-tools (2.0.10)
> 
> History: Using kexec -l (with ramdisk and command line params from 
> 3.4.91-kernel, and --atags) and kexec -e, the rescue kernel boots just 
> fine and seems to place itself in the beginning of RAM (according to 
> /proc/iomem) regardless of what is being set via --mem-min and 
> --mem-max. When reserving space in RAM using the boot-option 
> crashkernel, I have to use a high memory address because otherwise it
tells me the requested area is already in use.
> So we set crashkernel=128M at 512M. The kernel does not boot with kexec -p.
> 
> Current status: I understand that relocatable kernels
> (CONFIG_AUTO_ZRELADDR=y) must reside within the top 128MB which is not 
> possible for us. So I have worked around the standard kernel 
> configuration and forced CONFIG_ARM_PATCH_PHYS_VIRT to no and 
> CONFIG_PHYS_OFFSET to 0x20000000. I had to add a Makefile.boot for the 
> machine where I set zreladdr-y := 0x20008000, params_phys-y := 
> 0x20000100, initrd_phys-y := 0x20800000. Now the kernel still boots 
> fine using kexec -l and kexec -e and according to --mem-min. I can see 
> it is placed at 512MB. However, configuring it with -p and causing a 
> panic, the console says "Loading crashdump kernel... Bye!" and remains
silent forever.
> 
> All files and everything is only located in RAM.
> 
> What could I be doing wrong? Should I worry about the decompression 
> errors (even in the good case)?
> 
> >From dmesg:
> Reserving 128MB of memory at 512MB for crashkernel (System RAM: 760MB)
> 
> root at host:~# cat /proc/iomem
> 00000000-3bff9fff : System RAM
>   00008000-00724f43 : Kernel code
>   0076e000-0087553f : Kernel data
>   20000000-27ffffff : Crash kernel
> (some RAM at the end is reserved for persistent storage, that's why it 
> doesn't add up to 1GB)
> 
> Successful case:
> 
> root at host:~# kexec -l -t zImage --command-line="console=ttyS0,38400
> earlyprintk=ttyS0 root=/dev/ram rdinit=/sbin/init rw irqpoll maxcpus=1 
> reset_devices" --atags --initrd=./initramfs.cpio.gz -d 
> --mem-min=0x20000000
> --mem-max=0x28000000 ./zImage_fixed_addr Try gzip decompression.
> Try LZMA decompression.
> lzma_decompress_file: read on ./zImage_fixed_addr of 65536 bytes 
> failed
> kernel: 0xb6c06008 kernel_size: 0x3db659
> kexec_load: entry = 0x20008000 flags = 0x280000 nr_segments = 3
> segment[0].buf   = 0x40e98
> segment[0].bufsz = 0x3f0
> segment[0].mem   = 0x20001000
> segment[0].memsz = 0x1000
> segment[1].buf   = 0xb6c06008
> segment[1].bufsz = 0x3db659
> segment[1].mem   = 0x20008000
> segment[1].memsz = 0x3dc000
> segment[2].buf   = 0xb5ade008
> segment[2].bufsz = 0x1127516
> segment[2].mem   = 0x20f6e000
> segment[2].memsz = 0x1128000
> root at host:~# kexec -e
> Starting new kernel
> Booting Linux on physical CPU 0x0
> ...
> 
> After boot:
> 
> root at vanilla:~# cat /proc/iomem
> 20000000-3fffffff : System RAM
>   20008000-206dd237 : Kernel code
>   20720000-2078f54f : Kernel data
> 
> Unsuccessful case:
> 
> root at host:~# kexec -p -t zImage --command-line="console=ttyS0,38400
> earlyprintk=ttyS0 root=/dev/ram rdinit=/sbin/init rw irqpoll maxcpus=1 
> reset_devices" --atags --initrd=./initramfs.cpio.gz -d 
> ./zImage_fixed_addr Try gzip decompression Try LZMA decompression.
> lzma_decompress_file: read on ./zImage_fixed_addr of 65536 bytes 
> failed
> kernel: 0xb6b69008 kernel_size: 0x3db659
> phys_offset: 0
> kernel symbol _stext vaddr =         c0008240
> page_offset is set to c0000000
> get_crash_notes_per_cpu: crash_notes addr = 10f525c, size = 1024 Elf 
> header: p_type = 4, p_offset = 0x10f525c p_paddr = 0x10f525c p_vaddr =
> 0x0 p_filesz = 0x400 p_memsz = 0x400
> get_crash_notes_per_cpu: crash_notes addr = 10ff25c, size = 1024 Elf 
> header: p_type = 4, p_offset = 0x10ff25c p_paddr = 0x10ff25c p_vaddr =
> 0x0 p_filesz = 0x400 p_memsz = 0x400
> get_crash_notes_per_cpu: crash_notes addr = 110925c, size = 1024 Elf 
> header: p_type = 4, p_offset = 0x110925c p_paddr = 0x110925c p_vaddr =
> 0x0 p_filesz = 0x400 p_memsz = 0x400
> get_crash_notes_per_cpu: crash_notes addr = 111325c, size = 1024 Elf 
> header: p_type = 4, p_offset = 0x111325c p_paddr = 0x111325c p_vaddr =
> 0x0 p_filesz = 0x400 p_memsz = 0x400
> vmcoreinfo header: p_type = 4, p_offset = 0x7f1330 p_paddr = 0x7f1330 
> p_vaddr = 0x0 p_filesz = 0x1000 p_memsz = 0x1000 Elf header: p_type = 
> 1, p_offset = 0x0 p_paddr = 0x0 p_vaddr = 0xc0000000 p_filesz = 
> 0x20000000 p_memsz = 0x20000000 Elf header: p_type = 1, p_offset = 
> 0x28000000 p_paddr = 0x28000000 p_vaddr =
> 0xe8000000 p_filesz = 0x13ffa000 p_memsz = 0x13ffa000
> elfcorehdr: 0x27f00000
> crashkernel: [0x20000000 - 0x27ffffff] (128M) memory range: [0 - 
> 0x1fffffff] (512M) memory range: [0x28000000 - 0x3bff9fff] (319M) 
> kernel command line: "console=ttyS0,38400 earlyprintk=ttyS0 
> root=/dev/ram rdinit=/sbin/init rw irqpoll maxcpus=1 reset_devices 
> elfcorehdr=0x27f00000 mem=130048K"
> kexec_load: entry = 0x20008000 flags = 0x280001 nr_segments = 4
> segment[0].buf   = 0x416e0
> segment[0].bufsz = 0x410
> segment[0].mem   = 0x20001000
> segment[0].memsz = 0x1000
> segment[1].buf   = 0xb6b69008
> segment[1].bufsz = 0x3db659
> segment[1].mem   = 0x20008000
> segment[1].memsz = 0x3dc000
> segment[2].buf   = 0xb5a41008
> segment[2].bufsz = 0x1127516
> segment[2].mem   = 0x20f6e000
> segment[2].memsz = 0x1128000
> segment[3].buf   = 0x412a0
> segment[3].bufsz = 0x400
> segment[3].mem   = 0x27f00000
> segment[3].memsz = 0x1000
> 
> <cause crash via SysRq>
> 
> Loading crashdump kernel...
> Bye!

Although not sure, it might happen that your first kernel is corrupting
crash kernel and so you do not see any print even with earlyprintk enabled.
[it seems you are not using purgatory for sha256 verification].

Can you please try to limit memory visible to first kernel(pass mem=512M to
1st kenrel command line) and see if it improves?

~Pratyush
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5215 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/kexec/attachments/20151118/6d6d2e18/attachment.bin>