Hi Will, On Wed, Jun 13, 2018 at 3:41 PM, Will Deacon <will.deacon@xxxxxxx> wrote: > On Wed, Jun 13, 2018 at 10:46:56AM +0530, Bhupesh Sharma wrote: >> On Tue, Jun 12, 2018 at 3:42 PM, James Morse <james.morse@xxxxxxx> wrote: >> > On 12/06/18 09:25, Bhupesh Sharma wrote: >> >> On Tue, Jun 12, 2018 at 12:23 PM, Ard Biesheuvel >> >> <ard.biesheuvel@xxxxxxxxxx> wrote: >> >>> On 12 June 2018 at 08:36, Bhupesh Sharma <bhsharma@xxxxxxxxxx> wrote: >> >>>> The start of the linear region map on a KASLR enabled ARM64 machine - >> >>>> which supports a compatible EFI firmware (with EFI_RNG_PROTOCOL >> >>>> support), is no longer correctly represented by the PAGE_OFFSET macro, >> >>>> since it is defined as: >> >>>> >> >>>> (UL(1) << (VA_BITS - 1)) + 1) >> > >> >>> PAGE_OFFSET is the VA of the start of the linear map. The linear map >> >>> can be sparsely populated with actual memory, regardless of whether >> >>> KASLR is in effect or not. The only difference in the presence of >> >>> KASLR is that there may be such a hole at the beginning, but that does >> >>> not mean the linear map has moved, or that the value of PAGE_OFFSET is >> >>> now wrong. >> > >> >>>> So taking an example of a platform with VA_BITS=48, this gives a static >> >>>> value of: >> >>>> PAGE_OFFSET = 0xffff800000000000 >> >>>> >> >>>> However, for the KASLR case, we use the 'memstart_offset_seed' >> >>>> to randomize the linear region - since 'memstart_addr' indicates the >> >>>> start of physical RAM, we randomize the same on basis >> >>>> of 'memstart_offset_seed' value. >> >>>> >> >>>> As the PAGE_OFFSET value is used presently by several user space >> >>>> tools (for e.g. makedumpfile and crash tools) to determine the start >> >>>> of linear region and hence to read addresses (like PT_NOTE fields) from >> >>>> '/proc/kcore' for the non-KASLR boot cases, so it would be better to >> >>>> use 'memblock_start_of_DRAM()' value (converted to virtual) as >> >>>> the start of linear region for the KASLR cases and default to >> >>>> the PAGE_OFFSET value for non-KASLR cases to indicate the start of >> >>>> linear region. >> > >> >>> Userland code that assumes that the linear map cannot have a hole at >> >>> the beginning should be fixed. >> > >> >> That is a separate case (although that needs fixing as well via a >> >> kernel patch probably as the user-space tools rely on '/proc/iomem' >> >> contents to determine the first System RAM/reserved range). >> > >> > This is for kexec-tools generating the kdump vmcore ELF headers in user-space? >> >> Yes, but again, I would like to reiterate that the case where I see a >> hole at the start of the System RAM range (as I listed above) is just >> a specific case, which probably deserves a separate patch. The current >> patch though is for a generic issue (please see more details below). >> >> >> 1. In that particular case (see [1]) the EFI firmware sets the first >> >> EFI block as EfiReservedMemType: >> >> >> >> Region1: 0x000000000000-0x000000200000 [EfiReservedMemType] >> >> Region2: 0x000000200000-0x00000021fffff [EfiRuntimeServiceData] >> >> >> >> Since EFI firmware won't return the "EfiReservedMemType" memory to >> >> Linux kernel, >> > >> > (Its linux that makes this choice in >> > drivers/firmware/efi/arm-init.c::is_usable_memory()) >> > >> > >> >> so the kernel can't get any info about the first mem >> >> block, and kernel can only see region2 as below: >> >> >> >> efi: Processing EFI memory map: >> >> efi: 0x000000200000-0x00000021ffff [Runtime Data |RUN| | | >> >> | | | | |WB|WT|WC|UC] >> >> >> >> # head -1 /proc/iomem >> >> 00200000-0021ffff : reserved >> >> >> >> 2a. If we add debug prints to 'arch/arm64/mm/init.c' to print the >> >> kernel Virtual map we can see that the memory node is set to: >> >> >> >> # dmesg | grep memory >> >> .......... >> >> memory : 0xffff800000200000 - 0xffff801800000000 >> >> >> >> 2b. Now if we use kexec-tools to obtain a crash vmcore we can see that >> >> if we use 'readelf' to get the last program Header from vmcore (logs >> >> below are for the non-kaslr case): >> >> >> >> # readelf -l vmcore >> >> >> >> ELF Header: >> >> ........................ >> >> >> >> Program Headers: >> >> Type Offset VirtAddr PhysAddr >> >> FileSiz MemSiz Flags Align >> >> .............................................................................................................................................................. >> >> LOAD 0x0000000076d40000 0xffff80017fe00000 0x0000000180000000 >> >> 0x0000001680000000 0x0000001680000000 RWE 0 >> >> >> >> 3. So if we do a simple calculation: >> >> >> >> (VirtAddr + MemSiz) = 0xffff80017fe00000 + 0x0000001680000000 = >> >> 0xFFFF8017FFE00000 != 0xffff801800000000. >> >> >> >> which indicates that the end virtual memory nodes are not the same >> >> between vmlinux and vmcore. >> > >> > If I've followed this properly: the problem is that to generate the ELF headers >> > in the post-kdump vmcore, at kdump-load-time kexec-tools has to guess the >> > virtual addresses of the 'System RAM' regions it can see in /proc/iomem. >> > >> > The problem you are hitting is an invisible hole at the beginning of RAM, >> > meaning user-space's guess_phys_to_virt() is off by the size of this hole. >> > >> > Isn't KASLR a special case for this? You must have to correct for that after >> > kdump has happened, based on an elf-note in the vmcore. Can't we always do this? >> >> No, I hit this issue both for the KASLR and non-KASLR boot cases. We >> can fix this either in kernel or user-space. >> >> Fixing this in kernel space seems better to me as the definition of >> 'memstart_addr' is that it indicates the start of the physical ram, >> but since in this case there is a hole at the start of the system ram >> visible in Linux (and thus to user-space), but 'memstart_addr' is >> still 0 which seems contradictory at the least. This causes PHY_OFFSET >> to be 0 as well, which is again contradictory. > > Contradictory to who? I meant that the 'memstart_addr' and PHY_OFFSET value are computed as 0 in the above particular case, which is not the real representation of the start of System RAM as the 1st memory block available in Linux starts from 2MB [as confirmed by the 'memblock_start_of_DRAM()' value of 0x200000] and indicated by '/proc/iomem': # head -1 /proc/iomem 00200000-0021ffff : reserved > Userspace has no business messing around with this > stuff and I'm reluctant to make this an ABI by adding a symbol with a > special name. Why can't the various constants needed by these tools be > exported in the ELF headers for kcore/vmcore, or as a NOTE as James > suggests? That sounds a lot less fragile to me. But we already add the 'memstart_addr' variable to kallsyms in the kernel, don't we? And so user-space tools do use the same - so we already have a precedent available. Again this patch was an attempt to start a conversation as my query towards determining the base of linear range by either: - reading the 'memstart_addr' and backcomputing the start of linear range, or - adding a new variable (which this patch does), or - use other approaches did not see a conclusion (please see [1]). [1] https://www.spinics.net/lists/arm-kernel/msg655933.html Regards, Bhupesh _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec