Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ard, Akashi,

On 11/14/2017 04:50 PM, Ard Biesheuvel wrote:
On 13 November 2017 at 09:27, AKASHI Takahiro
<takahiro.akashi@xxxxxxxxxx> wrote:
Hi,

On Fri, Nov 10, 2017 at 05:41:56PM +0530, Bhupesh Sharma wrote:
Resent with Akashi's correct email address.

On Fri, Nov 10, 2017 at 5:39 PM, Bhupesh Sharma <bhsharma@xxxxxxxxxx> wrote:
Hi Ard, Akashi

I have met an issue on an arm64 board using the latest master branch from Linus.
  (snip)

8. Also, I think now the crashkernel handling changed by
e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved
memblock regions explicitly in iomem), needs to be changed to handle
the change added by Ard to fix this issue on ACPI only machines.

I have a dirty hack in place, but I would like to have your opinions
about what can be a more concrete fix to this issue (as we mark these
regions as System RAM now rather than NOMAP) and I don't have a DTB
based machine to test on currently.

I don't know much about acpi reclaim regions,
can you please tell me how your change affects your panic case?

Sorry I was away yesterday and couldn't get back with the dirty hack details. But I see Ard has already proposed the following change and it looks similar to the change I did locally however that doesn't seem to fix the issue completely at my end so far.

Here are more details on the same ..


Does this help at all?

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 7768423b39d3..61d867647cca 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -213,7 +213,7 @@ static void __init request_standard_resources(void)

        for_each_memblock(memory, region) {
                res = alloc_bootmem_low(sizeof(*res));
-               if (memblock_is_nomap(region)) {
+               if (memblock_is_nomap(region) || memblock_is_reserved(region)) {
                        res->name  = "reserved";
                        res->flags = IORESOURCE_MEM;
                } else {


.. So, I tried using the 'memblock_is_reserved' check in ' request_standard_resources' however as 'memblock_is_reserved' expects a phy_addr as an input argument, I changed mine to something like this:

-               if (memblock_is_nomap(region)) {
+ if (memblock_is_nomap(region) || memblock_is_reserved(__pfn_to_phys(memblock_region_reserved_base_pfn(region)))) {

However, I see I am hitting a still hitting the issue and its quite peculiar one. First some more background on what is happening on this
Huawei Taishan arm64 board that I have:

1a. I see from the boot logs that one of the ACPI tables (DSDT) is at phy addr 0x39710000:

# dmesg | grep -i "DSDT"
[ 0.000000] ACPI: DSDT 0x0000000039710000 006656 (v02 HISI HIP07 00000000 INTL 20151124)

1b. This DSDT table is correctly marked as a ACPI Reclaim memory, however I see that just preceding this entry there also is a 'Boot Code' entry from address '0x0000396c0000-0x00003970ffff':

# dmesg | grep -B 2 -i "ACPI reclaim"
[ 0.000000] efi: 0x000039670000-0x0000396bffff [Runtime Code |RUN| | | | | | | |WB|WT|WC|UC] [ 0.000000] efi: 0x0000396c0000-0x00003970ffff [Boot Code | | | | | | | | |WB|WT|WC|UC] [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC]

2. Now, I am not sure which kernel layer does the following changes (I am still trying to dig it out more), but I see that the 'Boot Code' and ACPI DSDT table regions are somehow merged into one memblock_region and appear as range '396c0000-3975ffff' in the '/proc/iomem' interface:

# cat /proc/iomem | grep -A 2 -B 2 39
00000000-3961ffff : System RAM
  00080000-00b6ffff : Kernel code
  00cb0000-0167ffff : Kernel data
  0e800000-2e7fffff : Crash kernel
39620000-396bffff : reserved
396c0000-3975ffff : System RAM
39760000-3976ffff : reserved
39770000-397affff : reserved
397b0000-3989ffff : reserved
398a0000-398bffff : reserved
398c0000-39d3ffff : reserved
39d40000-3ed2ffff : System RAM

3. As to why this merged region appears as a System RAM area, rather than a RESERVED one, the following code path explains the same:

3a. The check we added in 'arch/arm64/kernel/setup.c' doesn't handle the ACPI DSDT table properly and mark it as 'RESERVED'. This is because 'memblock_is_reserved' calls 'memblock_search' internally which is implemented currently as:

static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr)
{
	unsigned int left = 0, right = type->cnt;

	do {
		unsigned int mid = (right + left) / 2;

		if (addr < type->regions[mid].base)
			right = mid;
		else if (addr >= (type->regions[mid].base +
				  type->regions[mid].size))
			left = mid + 1;
		else
			return mid;
	} while (left < right);
	return -1;
}

3b. Since 'addr' being passed to 'memblock_search' calculated via '__pfn_to__phys(memblock_region_memory_base_pfn(region)' in this case is 0x396c0000 (see iomem entry in point 2 above), so we never see that
this memblock is reserved for the ACPI DSDT entry at 0x39710000.

4. Now, when we run the kexec-tools to load a crashdump kernel, it doesn't find an entry for the ACPI DSDT table in the reserved range (but instead finds it as a System RAM range):

# kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname -r`.img --reuse-cmdline -d

...
get_memory_ranges_iomem_cb: 0000000000000000 - 000000003961ffff : System RAM
get_memory_ranges_iomem_cb: 0000000039620000 - 00000000396bffff : reserved
get_memory_ranges_iomem_cb: 00000000396c0000 - 000000003975ffff : System RAM
get_memory_ranges_iomem_cb: 0000000039760000 - 000000003976ffff : reserved
get_memory_ranges_iomem_cb: 0000000039770000 - 00000000397affff : reserved
get_memory_ranges_iomem_cb: 00000000397b0000 - 000000003989ffff : reserved
get_memory_ranges_iomem_cb: 00000000398a0000 - 00000000398bffff : reserved
get_memory_ranges_iomem_cb: 00000000398c0000 - 0000000039d3ffff : reserved
get_memory_ranges_iomem_cb: 0000000039d40000 - 000000003ed2ffff : System RAM
get_memory_ranges_iomem_cb: 000000003ed30000 - 000000003ed5ffff : reserved
get_memory_ranges_iomem_cb: 000000003ed60000 - 000000003fbfffff : System RAM
get_memory_ranges_iomem_cb: 0000001040000000 - 0000001ffbffffff : System RAM
get_memory_ranges_iomem_cb: 0000002000000000 - 0000002ffbffffff : System RAM
get_memory_ranges_iomem_cb: 0000009000000000 - 0000009ffbffffff : System RAM
get_memory_ranges_iomem_cb: 000000a000000000 - 000000affbffffff : System RAM
elf_arm64_probe: Not an ELF executable.
..

5. Now when a crash is issued to boot the crashkernel, we see it panic while trying to access the acpi tables (note that the logs below have been snipped for clarity):

# echo c > /proc/sysrq-trigger

...
[  419.495621] Bye!
...
[ 0.000000] efi: 0x0000396c0000-0x00003970ffff [Boot Code | | | | | | | | |WB|WT|WC|UC] [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC]
...
[ 0.000000] ACPI: DSDT 0x0000000039710000 006656 (v02 HISI HIP07 00000000 INTL 20151124)
...
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000010200000-0x00000000301fffff]
[    0.000000]   node   0: [mem 0x0000000039620000-0x00000000396bffff]
[    0.000000]   node   0: [mem 0x0000000039760000-0x000000003976ffff]
[    0.000000]   node   0: [mem 0x00000000397b0000-0x000000003989ffff]
[    0.000000]   node   0: [mem 0x00000000398c0000-0x0000000039d3ffff]
[    0.000000]   node   0: [mem 0x000000003ed30000-0x000000003ed5ffff]
...
[    0.039309] ACPI: Core revision 20170728
[ 0.044383] Unable to handle kernel paging request at virtual address ffff000009f10027
[    0.052386] Mem abort info:
[    0.055201]   Exception class = DABT (current EL), IL = 32 bits
[    0.061179]   SET = 0, FnV = 0
[    0.064258]   EA = 0, S1PTW = 0
[    0.067424] Data abort info:
[    0.070326]   ISV = 0, ISS = 0x00000021
[    0.074195]   CM = 0, WnR = 0
[ 0.077187] swapper pgtable: 64k pages, 48-bit VAs, pgd = ffff000009650000 [ 0.084133] [ffff000009f10027] *pgd=00000000301d0003, *pud=00000000301d0003, *pmd=00000000301c0003, *pte=00e8000039710707
[    0.095215] Internal error: Oops: 96000021 [#1] SMP
[    0.100139] Modules linked in:
[    0.103219] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0+ #30
[    0.109373] task: ffff000008d05580 task.stack: ffff000008cc0000
[    0.115356] PC is at acpi_ns_lookup+0x25c/0x3c0
[    0.119929] LR is at acpi_ds_load1_begin_op+0xa4/0x294
[ 0.125117] pc : [<ffff0000084a862c>] lr : [<ffff00000849d3c0>] pstate: 60000045
[    0.132589] sp : ffff000008ccfb40
[    0.135930] x29: ffff000008ccfb40 x28: ffff000008a9c18c
[    0.141295] x27: ffff0000088be820 x26: 0000000000000000
[    0.146659] x25: 000000000000001b x24: 0000000000000001
[    0.152024] x23: 0000000000000001 x22: ffff000009f10027
[    0.157389] x21: ffff000008ccfc50 x20: 0000000000000001
[    0.162753] x19: 000000000000001b x18: 0000000000000005
[    0.168117] x17: 0000000000000000 x16: 0000000000000000
[    0.173481] x15: 0000000000000000 x14: 000000000000038e
[    0.178846] x13: ffffffff00000000 x12: ffffffffffffffff
[    0.184210] x11: 0000000000000006 x10: 00000000ffffff76
[    0.189574] x9 : 000000000000005f x8 : ffff800014670140
[    0.194939] x7 : 0000000000000000 x6 : ffff000008ccfc50
[    0.200303] x5 : ffff800012d45000 x4 : 0000000000000001
[    0.205668] x3 : ffff000008ccfbe0 x2 : ffff0000095e3a00
[    0.211032] x1 : ffff000009f10027 x0 : 0000000000000000
[    0.216397] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
[    0.223166] Call trace:
[    0.225629] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
[ 0.232136] fa00: 0000000000000000 ffff000009f10027 ffff0000095e3a00 ffff000008ccfbe0 [ 0.240048] fa20: 0000000000000001 ffff800012d45000 ffff000008ccfc50 0000000000000000 [ 0.247960] fa40: ffff800014670140 000000000000005f 00000000ffffff76 0000000000000006 [ 0.255872] fa60: ffffffffffffffff ffffffff00000000 000000000000038e 0000000000000000 [ 0.263785] fa80: 0000000000000000 0000000000000000 0000000000000005 000000000000001b [ 0.271697] faa0: 0000000000000001 ffff000008ccfc50 ffff000009f10027 0000000000000001 [ 0.279609] fac0: 0000000000000001 000000000000001b 0000000000000000 ffff0000088be820 [ 0.287521] fae0: ffff000008a9c18c ffff000008ccfb40 ffff00000849d3c0 ffff000008ccfb40 [ 0.295433] fb00: ffff0000084a862c 0000000060000045 ffff000008ccfb40 ffff000008261918 [ 0.303345] fb20: ffffffffffffffff ffff0000087f193c ffff000008ccfb40 ffff0000084a862c
[    0.311258] [<ffff0000084a862c>] acpi_ns_lookup+0x25c/0x3c0
[    0.316885] [<ffff00000849d3c0>] acpi_ds_load1_begin_op+0xa4/0x294
[    0.323128] [<ffff0000084af374>] acpi_ps_build_named_op+0xc4/0x198
[    0.329371] [<ffff0000084af594>] acpi_ps_create_op+0x14c/0x270
[    0.335262] [<ffff0000084aee70>] acpi_ps_parse_loop+0x188/0x5c8
[    0.341241] [<ffff0000084aff10>] acpi_ps_parse_aml+0xb0/0x2b8
[    0.347044] [<ffff0000084aacd8>] acpi_ns_one_complete_parse+0x144/0x184
[    0.353726] [<ffff0000084aad60>] acpi_ns_parse_table+0x48/0x68
[    0.359616] [<ffff0000084aa194>] acpi_ns_load_table+0x4c/0xdc
[    0.365420] [<ffff0000084b51c0>] acpi_tb_load_namespace+0xe4/0x264
[    0.371664] [<ffff000008bafd64>] acpi_load_tables+0x48/0xc0
[    0.377292] [<ffff000008badfd0>] acpi_early_init+0x9c/0xd0
[    0.382832] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c

So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT table' ranges to be merged into a single region at '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using 'memblock_is_reserved'.

Any pointers?

Regards,
Bhupesh

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux