Hi Bhupesh, Ard, On 12/06/18 09:25, Bhupesh Sharma wrote: > On Tue, Jun 12, 2018 at 12:23 PM, Ard Biesheuvel > <ard.biesheuvel@xxxxxxxxxx> wrote: >> On 12 June 2018 at 08:36, Bhupesh Sharma <bhsharma@xxxxxxxxxx> wrote: >>> The start of the linear region map on a KASLR enabled ARM64 machine - >>> which supports a compatible EFI firmware (with EFI_RNG_PROTOCOL >>> support), is no longer correctly represented by the PAGE_OFFSET macro, >>> since it is defined as: >>> >>> (UL(1) << (VA_BITS - 1)) + 1) >> PAGE_OFFSET is the VA of the start of the linear map. The linear map >> can be sparsely populated with actual memory, regardless of whether >> KASLR is in effect or not. The only difference in the presence of >> KASLR is that there may be such a hole at the beginning, but that does >> not mean the linear map has moved, or that the value of PAGE_OFFSET is >> now wrong. >>> So taking an example of a platform with VA_BITS=48, this gives a static >>> value of: >>> PAGE_OFFSET = 0xffff800000000000 >>> >>> However, for the KASLR case, we use the 'memstart_offset_seed' >>> to randomize the linear region - since 'memstart_addr' indicates the >>> start of physical RAM, we randomize the same on basis >>> of 'memstart_offset_seed' value. >>> >>> As the PAGE_OFFSET value is used presently by several user space >>> tools (for e.g. makedumpfile and crash tools) to determine the start >>> of linear region and hence to read addresses (like PT_NOTE fields) from >>> '/proc/kcore' for the non-KASLR boot cases, so it would be better to >>> use 'memblock_start_of_DRAM()' value (converted to virtual) as >>> the start of linear region for the KASLR cases and default to >>> the PAGE_OFFSET value for non-KASLR cases to indicate the start of >>> linear region. >> Userland code that assumes that the linear map cannot have a hole at >> the beginning should be fixed. > That is a separate case (although that needs fixing as well via a > kernel patch probably as the user-space tools rely on '/proc/iomem' > contents to determine the first System RAM/reserved range). This is for kexec-tools generating the kdump vmcore ELF headers in user-space? > 1. In that particular case (see [1]) the EFI firmware sets the first > EFI block as EfiReservedMemType: > > Region1: 0x000000000000-0x000000200000 [EfiReservedMemType] > Region2: 0x000000200000-0x00000021fffff [EfiRuntimeServiceData] > > Since EFI firmware won't return the "EfiReservedMemType" memory to > Linux kernel, (Its linux that makes this choice in drivers/firmware/efi/arm-init.c::is_usable_memory()) > so the kernel can't get any info about the first mem > block, and kernel can only see region2 as below: > > efi: Processing EFI memory map: > efi: 0x000000200000-0x00000021ffff [Runtime Data |RUN| | | > | | | | |WB|WT|WC|UC] > > # head -1 /proc/iomem > 00200000-0021ffff : reserved > > 2a. If we add debug prints to 'arch/arm64/mm/init.c' to print the > kernel Virtual map we can see that the memory node is set to: > > # dmesg | grep memory > .......... > memory : 0xffff800000200000 - 0xffff801800000000 > > 2b. Now if we use kexec-tools to obtain a crash vmcore we can see that > if we use 'readelf' to get the last program Header from vmcore (logs > below are for the non-kaslr case): > > # readelf -l vmcore > > ELF Header: > ........................ > > Program Headers: > Type Offset VirtAddr PhysAddr > FileSiz MemSiz Flags Align > .............................................................................................................................................................. > LOAD 0x0000000076d40000 0xffff80017fe00000 0x0000000180000000 > 0x0000001680000000 0x0000001680000000 RWE 0 > > 3. So if we do a simple calculation: > > (VirtAddr + MemSiz) = 0xffff80017fe00000 + 0x0000001680000000 = > 0xFFFF8017FFE00000 != 0xffff801800000000. > > which indicates that the end virtual memory nodes are not the same > between vmlinux and vmcore. If I've followed this properly: the problem is that to generate the ELF headers in the post-kdump vmcore, at kdump-load-time kexec-tools has to guess the virtual addresses of the 'System RAM' regions it can see in /proc/iomem. The problem you are hitting is an invisible hole at the beginning of RAM, meaning user-space's guess_phys_to_virt() is off by the size of this hole. Isn't KASLR a special case for this? You must have to correct for that after kdump has happened, based on an elf-note in the vmcore. Can't we always do this? > This happens because the kexec-tools rely on 'proc/iomem' contents > while 'memstart_addr' is computed as 0 by kernel (as value of > memblock_start_of_DRAM() < ARM64_MEMSTART_ALIGN). > Returning back to this patch, this is a generic requirement where we > need the linear region start/base addresses in user-space applications > which is used to read addresses which lie in the linear region (for > e.g. when we read /proc/kcore contents). >>> I tested this on my qualcomm (which supports EFI_RNG_PROTOCOL) >>> and apm mustang (which does not support EFI_RNG_PROTOCOL) arm64 boards >>> and was able to use a modified user space utility (like kexec-tools and >>> makedumpfile) to determine the start of linear region correctly for >>> both the KASLR and non-KASLR boot cases. >>> >> >> Can you explain the nature of the changes to the userland code? > > The changes are not to rely on the fixed PAGE_OFFSET macro value for > determining the base address of the linear region, but rather read the > ' linear_reg_start_addr' symbol from kernel and use the same both in > case of KASLR and non-KASLR boots to determine the base of the linear > region (in [2], I have implemented a test change to kexec-tools to > read the 'linear_reg_start_addr' symbol which is available on my Don't use /dev/mem. > public github tree, I have a similar change available in makedumpfile > which I have not yet pushed to github, as it implements other features > as well) >>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h >>> index 49d99214f43c..bfd0915ecaf8 100644 >>> --- a/arch/arm64/include/asm/memory.h >>> +++ b/arch/arm64/include/asm/memory.h >>> @@ -178,6 +178,9 @@ extern s64 memstart_addr; >>> /* PHYS_OFFSET - the physical address of the start of memory. */ >>> #define PHYS_OFFSET ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; }) >>> >>> +/* the virtual base of the linear region. */ >>> +extern s64 linear_reg_start_addr; >>> + >>> /* the virtual base of the kernel image (minus TEXT_OFFSET) */ >>> extern u64 kimage_vaddr; >>> >>> diff --git a/arch/arm64/kernel/arm64ksyms.c b/arch/arm64/kernel/arm64ksyms.c >>> index d894a20b70b2..a92238ea45ff 100644 >>> --- a/arch/arm64/kernel/arm64ksyms.c >>> +++ b/arch/arm64/kernel/arm64ksyms.c >>> @@ -42,6 +42,7 @@ EXPORT_SYMBOL(__arch_copy_in_user); >>> >>> /* physical memory */ >>> EXPORT_SYMBOL(memstart_addr); >>> +EXPORT_SYMBOL(linear_reg_start_addr); >>> >>> /* string / mem functions */ >>> EXPORT_SYMBOL(strchr); >>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c >>> index 325cfb3b858a..29447adb0eef 100644 >>> --- a/arch/arm64/mm/init.c >>> +++ b/arch/arm64/mm/init.c >>> @@ -60,6 +60,7 @@ >>> * that cannot be mistaken for a real physical address. >>> */ >>> s64 memstart_addr __ro_after_init = -1; >>> +s64 linear_reg_start_addr __ro_after_init = PAGE_OFFSET; >>> phys_addr_t arm64_dma_phys_limit __ro_after_init; >>> >>> #ifdef CONFIG_BLK_DEV_INITRD >>> @@ -452,6 +453,8 @@ void __init arm64_memblock_init(void) >>> } >>> } >>> >>> + linear_reg_start_addr = __phys_to_virt(memblock_start_of_DRAM()); This patch adds a variable that nothing uses, its going to be removed. You can't depend on reading this via /dev/mem. Could you add the information you need as an elf-note to the vmcore instead? You must already pick these up to handle kaslr. (from memory, this is where the kaslr-offset is described to user-space after we kdump). Thanks, James _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec