Re: [PATCH] arm64: update PHYS_OFFSET to conform to kernel

Bhupesh Sharma <bhsharma@xxxxxxxxxx> · Wed, 30 May 2018 21:26:04 +0530

On 05/30/2018 03:50 PM, Jin, Yanjiang wrote:

-----Original Message-----
From: Bhupesh Sharma [mailto:bhsharma@xxxxxxxxxx]
Sent: 2018年5月30日 16:39
To: Jin, Yanjiang <yanjiang.jin@xxxxxxxxxxxxxxxx>; Pratyush Anand
<pratyush.anand@xxxxxxxxx>
Cc: kexec@xxxxxxxxxxxxxxxxxxx; jinyanjiang@xxxxxxxxx; horms@xxxxxxxxxxxx;
Zheng, Joey <yu.zheng@xxxxxxxxxxxxxxxx>
Subject: Re: [PATCH] arm64: update PHYS_OFFSET to conform to kernel

Hi Yanjiang,

On 05/30/2018 01:09 PM, Jin, Yanjiang wrote:

-----Original Message-----
From: Pratyush Anand [mailto:pratyush.anand@xxxxxxxxx]
Sent: 2018年5月30日 12:16
To: Jin, Yanjiang <yanjiang.jin@xxxxxxxxxxxxxxxx>
Cc: kexec@xxxxxxxxxxxxxxxxxxx; jinyanjiang@xxxxxxxxx;
horms@xxxxxxxxxxxx
Subject: Re: [PATCH] arm64: update PHYS_OFFSET to conform to kernel

Hi Yanjiang,

On Wed, May 30, 2018 at 8:33 AM, Jin, Yanjiang
<yanjiang.jin@xxxxxxxxxxxxxxxx>
wrote:
Hi Pratyush,

Thanks for your help! but please see my reply inline.

[...]

If an application, for example, vmcore-dmesg, wants to access the
kernel symbol which is located in the last 2M address, it would
fail with the below error:

    "No program header covering vaddr 0xffff8017ffe90000 found kexec
bug?"

I think, fix might not be correct.

Problem is in vmcore-dmesg and that should be fixed and not the kexec.
See here (https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-
tools.git/tree/vmcore-dmesg/vmcore-dmesg.c?id=HEAD#n261).

Firstly, for my patch, vmcore-dmesg is just an auxiliary application
to help to
reproduce this issue. The function, which is to generate vmcore,  is the root
cause.

...and the function which generates vmcore is not the kexec rather
the secondary kernel.

On the other hand, vmcore-dmesg is under kexec-tools, it has no a
standalone
git repo.  Even we want to fix vmcore-dmesg, we still need to send
the patch to kexec-tools, right?

Sure. I meant `kexec` application. We have three applications in kexec-tools.
`kexec`, `vmcore-dmesg` and `kdump`. [I hope kdump is useless and we
are going to get rid off it very soon.]

Yanjiang

How symbols are extracted from vmcore.

You do have "NUMBER(PHYS_OFFSET)=" information in vmcore.

You can probably see makedumpfile code, that how to extract
information from "NUMBER".

I have seen makedumpfile before, NUMBER(number) is just read a
number
from vmcore. But as I show before, the root issue is vmcore contains
a wrong number, my patch is to fix the vmcore generating issue, we
can't read vmcore at this point since we don't have vmcore yet.

..and IIUC, you were able to reach correctly till the end of
secondary kernel where you tried vmcore-dmesg and then you had issue,
right?

How did you conclude that vmcore contains wrong number? It's
unlikely, but if it does then we have problem somewhere in Linux kernel , not
here.

Hi Pratyush,

I think I have found the root cause. In Linux kernel, memblock_mark_nomap()
will reserve some memory ranges for EFI, such as EFI_RUNTIME_SERVICES_DATA,
EFI_BOOT_SERVICES_DATA. On my environment, the first 2M memory is
EFI_RUNTIME_SERVICES_DATA, so it can't be seen in kernel. We also can't set
this EFI memory as "reserved", only EFI_ACPI_RECLAIM_MEMORY's memory can
be set as "reserved" and seen in kernel.
So I don't think this is a kernel issue, we should fix it in kexec-tools.
Attach kernel's call stack for reference.

drivers/firmware/efi/arm-init.c

efi_init()->reserve_regions()->memblock_mark_nomap()

Hi Bhupesh,

I guess your environment has no EFI support, or the first memblock is not
reserved for EFI, so you can't reproduce this issue.

Perhaps you missed reading my earlier threads on the subject of
EFI_ACPI_RECLAIM_MEMORY regions being mapped as NOMAP and how it
causes the crashkernel to panic (please go through [1]).

As of now we haven't found a acceptable-to-all solution for the issue and it needs
to be fixed in the 'kexec-tools' with a minor fix in the kernel side as well.

So, coming back to my environment details, it has both EFI support as well as EFI
ACPI RECLAIM regions.

However we may be hitting a special case in your environment, so I think before
we can discuss your patch further (as both Pratyush and myself have concerns
with the same), would request you to share the
following:

- output of kernel dmesg with 'efi=debug' added in the bootargs (which will help
us see how the memblocks are marked at your setup - I am specifically interested
in the logs after the line 'Processing EFI memory map'),

I made more investigation on my board.   I believe that the firmware design leads this differences between our environments:

My firmware defines the first two EFI block as below:

Region1: 0x000000000000-0x000000200000 [EfiReservedMemType]
Region2: 0x000000200000-0x00000021fffff [EfiRuntimeServiceData]

But EFI API won't return the "EfiReservedMemType" memory to Linux kernel for security reasons, so kernel can't get any info about the first mem block, kernel can only see region2 as below:

efi: Processing EFI memory map:
efi:   0x000000200000-0x00000021ffff [Runtime Data       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]

# head -1 /proc/iomem
00200000-0021ffff : reserved

I have the same case on boards at my end:

# head -1 /proc/iomem
00200000-0021ffff : reserved

# dmesg | grep -i "Processing EFI memory map" -A 5
[    0.000000] efi: Processing EFI memory map:
[    0.000000] efi:   0x000000200000-0x00000021ffff [Runtime Data 
|RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] efi:   0x000000400000-0x0000005fffff [ACPI Memory NVS 
|   |  |  |  |  |  |  |   |  |  |  |UC]
[    0.000000] efi:   0x000000800000-0x00000081ffff [ACPI Memory NVS 
|   |  |  |  |  |  |  |   |  |  |  |UC]
[    0.000000] efi:   0x000000820000-0x000001600fff [Conventional 
Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] efi:   0x000001601000-0x0000027fffff [Loader Data 
|   |  |  |  |  |  |  |   |WB|WT|WC|UC]

So, no your environment is not a special one (as I also use ATF as the 
EL3 boot firmware), see more below ..

There are many EfiReservedMemType regions in ARM64's firmware if it supports TrustZone, but if a firmware doesn't put this type of memory region at the start of physical memory, this error wouldn't happen. I don't think firmware has error since it can reserve any memory regions, we'd better update kexec-tools.
Anyway, read memstart_addr from /dev/mem can always get  a correct value if DEVMEM is defined.

.. At my side with the latest upstream kernel (with commit 
f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a reverted to allow crashkernel 
to boot while accessing ACPI tables) and latest upstream kexec-tools, I 
can boot the crashkernel properly, collect the vmcore properly and 
analyze the crash dump via tools like gdb and crash also.

So, I would try to also use the vmcore-dmesg tool and see if I get any 
issues with the same. Till then you can try and see if there are any 
other obvious differences in your environment which might be causing 
this issue at your end.

Thanks,
Bhupesh

- if you are using a public arm64 platform maybe you can share the CONFIG file,
- output of 'cat /proc/iomem'

[1] https://www.spinics.net/lists/arm-kernel/msg616632.html

Thanks,
Bhupesh

Have you tried to extract "PHYS_OFFSET" from vmcore either in
vmcore-dmesg or in makedumpfile and found it not matching to the value of
"PHYS_OFFSET"
from first kernel?

In my understanding flow is like this:

- First kernel will have reserved area for secondary kernel, as well as for
elfcore.
- First kernel will embed all the vmcore information notes into
elfcore (see
crash_save_vmcoreinfo_init() -> arch_crash_save_vmcoreinfo()).
Therefore, we will have PHYS_OFFSET, kimage_voffset and VA_BITS
information for first kernel in vmcore, which is in separate memory
and can be read by second kernel
- elfcore will also have notes about all the other physical memory of
first kernel which need to be copied by second kernel.
- Now when crash happens, second kernel should have all the required
info for reading symbols from first kernel's physical memory, no?

NUMBER(number) = read_vmcoreinfo_ulong(STR_NUMBER(str_number))

Yanjiang

Once you know the real PHYS_OFFSET (which could have been random if
KASLR is enabled), you can fix the problem you are seeing.

I have both validated with/without KASLR,  all of them worked well
after
applying my patch.

IMHO, even if that works it does not mean that its good a fix. We
should try to find root cause. Moreover, you might not have /dev/mem
available for all the configuration where KASLR is enabled.

Regards
Pratyush

This email is intended only for the named addressee. It may contain
information that is confidential/private, legally privileged, or copyright-protected,
and you should handle it accordingly. If you are not the intended recipient, you
do not have legal rights to retain, copy, or distribute this email or its contents, and
should promptly delete the email and all electronic copies in your system; do not
retain copies in any media. If you have received this email in error, please notify
the sender promptly. Thank you.

This email is intended only for the named addressee. It may contain information that is confidential/private, legally privileged, or copyright-protected, and you should handle it accordingly. If you are not the intended recipient, you do not have legal rights to retain, copy, or distribute this email or its contents, and should promptly delete the email and all electronic copies in your system; do not retain copies in any media. If you have received this email in error, please notify the sender promptly. Thank you.

_______________________________________________
kexec mailing list
kexec@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/kexec