On Thu, Sep 24, 2020 at 05:05:45AM +0000, HAGIO KAZUHITO(萩尾 一仁) wrote: > Hi Bhupesh, > > Thank you for the updated patch. > > -----Original Message----- > > With ARMv8.2-LVA architecture extension availability, arm64 hardware > > which supports this extension can support upto 52-bit virtual > > addresses. It is specially useful for having a 52-bit user-space virtual > > address space while the kernel can still retain 48-bit/52-bit virtual > > addressing. > > > > Since at the moment we enable the support of this extension in the > > kernel via a CONFIG flag (CONFIG_ARM64_VA_BITS_52), so there are > > no clear mechanisms in user-space to determine this CONFIG > > flag value and use it to determine the kernel-space VA address range > > values. > > > > 'makedumpfile' can instead use 'TCR_EL1.T1SZ' value from vmcoreinfo > > which indicates the size offset of the memory region addressed by > > TTBR1_EL1 (and hence can be used for determining the > > vabits_actual value). > > > > Using the vmcoreinfo variable exported by kernel commit > > bbdbc11804ff ("arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo"), > > the user-space can use the following computation for determining whether > > an address lies in the linear map range (for newer kernels >= 5.4): > > > > #define __is_lm_address(addr) (!(((u64)addr) & BIT(vabits_actual - 1))) > > > > Note that for the --mem-usage case though we need to calculate > > vabits_actual value before the vmcoreinfo read functionality is ready, > > For this, can't we read the TCR_EL1.T1SZ from vmcoreinfo in /proc/kcore's > ELF note? I think we can use the common functions used to vmcore with it. > > I'll write a patch to do so if it sounds good. > > > so we can instead read the architecture register ID_AA64MMFR2_EL1 > > directly to see if the underlying hardware supports 52-bit addressing > > and accordingly set vabits_actual as: > > > > read_id_aa64mmfr2_el1(); > > if (hardware supports 52-bit addressing) > > vabits_actual = 52; > > else > > vabits_actual = va_bits value calculated via _stext symbol; > > > > Also make sure that the page_offset, is_linear_addr(addr) and __pa() > > calculations work both for older (< 5.4) and newer kernels (>= 5.4). > > > > I have tested several combinations with both kernel categories > > [for e.g. with different VA (39, 42, 48 and 52-bit) and PA combinations > > (48 and 52-bit)] on at-least 3 different boards. > > > > Unfortunately, this means that we need to call 'populate_kernel_version()' > > earlier 'get_page_offset_arm64()' as 'info->kernel_version' remains > > uninitialized before its first use otherwise. > > The populate_kernel_version() uses uname(), so this means that there will > be some cases that makedumpfile doesn't work with vmcores which were > captured on other kernels than running one. This is a rather big limitation > especially to backward-compatibility test, and it would be better to > avoid changing behavior depending on environment, not on data. > > Is there no room to avoid it? I have a new idea about it, which avoid any version judgement. Please see the comment inline > > Just an idea, but can we use the OSRELEASE vmcoreinfo in ELF note first > to determine the kernel version? It's from init_uts_ns.name.release, > why can't we use it? > > Thanks, > Kazu > > > > > This patch is in accordance with ARMv8 Architecture Reference Manual > > > > Cc: Kazuhito Hagio <k-hagio at ab.jp.nec.com> > > Cc: John Donnelly <john.p.donnelly at oracle.com> > > Cc: kexec at lists.infradead.org > > Signed-off-by: Bhupesh Sharma <bhsharma at redhat.com> > > --- > > arch/arm64.c | 233 ++++++++++++++++++++++++++++++++++++++++++------- > > common.h | 10 +++ > > makedumpfile.c | 4 +- > > makedumpfile.h | 6 +- > > 4 files changed, 218 insertions(+), 35 deletions(-) > > > > diff --git a/arch/arm64.c b/arch/arm64.c > > index 709e0a506916..ccaa8641ca66 100644 > > --- a/arch/arm64.c > > +++ b/arch/arm64.c > > @@ -19,10 +19,23 @@ > > > > #ifdef __aarch64__ > > > > +#include <asm/hwcap.h> > > +#include <sys/auxv.h> > > #include "../elf_info.h" > > #include "../makedumpfile.h" > > #include "../print_info.h" > > > > +/* ID_AA64MMFR2_EL1 related helpers: */ > > +#define ID_AA64MMFR2_LVA_SHIFT 16 > > +#define ID_AA64MMFR2_LVA_MASK (0xf << ID_AA64MMFR2_LVA_SHIFT) > > + > > +/* CPU feature ID registers */ > > +#define get_cpu_ftr(id) ({ \ > > + unsigned long __val; \ > > + asm volatile("mrs %0, " __stringify(id) : "=r" (__val)); \ > > + __val; \ > > +}) > > + > > typedef struct { > > unsigned long pgd; > > } pgd_t; > > @@ -47,6 +60,7 @@ typedef struct { > > static int lpa_52_bit_support_available; > > static int pgtable_level; > > static int va_bits; > > +static int vabits_actual; > > static unsigned long kimage_voffset; > > > > #define SZ_4K 4096 > > @@ -58,7 +72,6 @@ static unsigned long kimage_voffset; > > #define PAGE_OFFSET_42 ((0xffffffffffffffffUL) << 42) > > #define PAGE_OFFSET_47 ((0xffffffffffffffffUL) << 47) > > #define PAGE_OFFSET_48 ((0xffffffffffffffffUL) << 48) > > -#define PAGE_OFFSET_52 ((0xffffffffffffffffUL) << 52) > > > > #define pgd_val(x) ((x).pgd) > > #define pud_val(x) (pgd_val((x).pgd)) > > @@ -219,13 +232,25 @@ pmd_page_paddr(pmd_t pmd) > > #define pte_index(vaddr) (((vaddr) >> PAGESHIFT()) & (PTRS_PER_PTE - 1)) > > #define pte_offset(dir, vaddr) (pmd_page_paddr((*dir)) + pte_index(vaddr) * sizeof(pte_t)) > > > > +/* > > + * The linear kernel range starts at the bottom of the virtual address > > + * space. Testing the top bit for the start of the region is a > > + * sufficient check and avoids having to worry about the tag. > > + */ > > +#define is_linear_addr(addr) ((info->kernel_version < KERNEL_VERSION(5, 4, 0)) ? \ > > + (!!((unsigned long)(addr) & (1UL << (vabits_actual - 1)))) : \ > > + (!((unsigned long)(addr) & (1UL << (vabits_actual - 1))))) > > + > > static unsigned long long > > __pa(unsigned long vaddr) > > { > > if (kimage_voffset == NOT_FOUND_NUMBER || > > - (vaddr >= PAGE_OFFSET)) > > - return (vaddr - PAGE_OFFSET + info->phys_base); > > - else > > + is_linear_addr(vaddr)) { > > + if (info->kernel_version < KERNEL_VERSION(5, 4, 0)) > > + return ((vaddr & ~PAGE_OFFSET) + info->phys_base); > > + else > > + return (vaddr + info->phys_base - PAGE_OFFSET); > > + } else > > return (vaddr - kimage_voffset); > > } > > > > @@ -254,6 +279,7 @@ static int calculate_plat_config(void) > > (PAGESIZE() == SZ_64K && va_bits == 42)) { > > pgtable_level = 2; > > } else if ((PAGESIZE() == SZ_64K && va_bits == 48) || > > + (PAGESIZE() == SZ_64K && va_bits == 52) || > > (PAGESIZE() == SZ_4K && va_bits == 39) || > > (PAGESIZE() == SZ_16K && va_bits == 47)) { > > pgtable_level = 3; > > @@ -288,8 +314,14 @@ get_phys_base_arm64(void) > > return TRUE; > > } > > > > + /* Ignore the 1st PT_LOAD */ > > if (get_num_pt_loads() && PAGE_OFFSET) { > > - for (i = 0; > > + /* Note that the following loop starts with i = 1. > > + * This is required to make sure that the following logic > > + * works both for old and newer kernels (with flipped > > + * VA space, i.e. >= 5.4.0) > > + */ > > + for (i = 1; > > get_pt_load(i, &phys_start, NULL, &virt_start, NULL); > > i++) { > > if (virt_start != NOT_KV_ADDR > > @@ -346,6 +378,139 @@ get_stext_symbol(void) > > return(found ? kallsym : FALSE); > > } > > > > +static int > > +get_va_bits_from_stext_arm64(void) > > +{ > > + ulong _stext; > > + > > + _stext = get_stext_symbol(); > > + if (!_stext) { > > + ERRMSG("Can't get the symbol of _stext.\n"); > > + return FALSE; > > + } > > + > > + /* Derive va_bits as per arch/arm64/Kconfig. Note that this is a > > + * best case approximation at the moment, as there can be > > + * inconsistencies in this calculation (for e.g., for > > + * 52-bit kernel VA case, the 48th bit is set in > > + * the _stext symbol). > > + * > > + * So, we need to rely on the vabits_actual symbol in the > > + * vmcoreinfo or read via system register for a accurate value > > + * of the virtual addressing supported by the underlying kernel. > > + */ > > + if ((_stext & PAGE_OFFSET_48) == PAGE_OFFSET_48) { > > + va_bits = 48; > > + } else if ((_stext & PAGE_OFFSET_47) == PAGE_OFFSET_47) { > > + va_bits = 47; > > + } else if ((_stext & PAGE_OFFSET_42) == PAGE_OFFSET_42) { > > + va_bits = 42; > > + } else if ((_stext & PAGE_OFFSET_39) == PAGE_OFFSET_39) { > > + va_bits = 39; > > + } else if ((_stext & PAGE_OFFSET_36) == PAGE_OFFSET_36) { > > + va_bits = 36; > > + } else { > > + ERRMSG("Cannot find a proper _stext for calculating VA_BITS\n"); > > + return FALSE; > > + } > > + > > + DEBUG_MSG("va_bits : %d (approximation via _stext)\n", va_bits); > > + > > + return TRUE; > > +} > > + > > +/* Note that its important to note that the > > + * ID_AA64MMFR2_EL1 architecture register can be read > > + * only when we give an .arch hint to the gcc/binutils, > > + * so we use the gcc construct '__attribute__ ((target ("arch=armv8.2-a")))' > > + * here which is an .arch directive (see AArch64-Target-selection-directives > > + * documentation from ARM for details). This is required only for > > + * this function to make sure it compiles well with gcc/binutils. > > + */ > > +__attribute__ ((target ("arch=armv8.2-a"))) > > +static unsigned long > > +read_id_aa64mmfr2_el1(void) > > +{ > > + return get_cpu_ftr(ID_AA64MMFR2_EL1); > > +} > > + > > +static int > > +get_vabits_actual_from_id_aa64mmfr2_el1(void) > > +{ > > + int l_vabits_actual; > > + unsigned long val; > > + > > + /* Check if ID_AA64MMFR2_EL1 CPU-ID register indicates > > + * ARMv8.2/LVA support: > > + * VARange, bits [19:16] > > + * From ARMv8.2: > > + * Indicates support for a larger virtual address. > > + * Defined values are: > > + * 0b0000 VMSAv8-64 supports 48-bit VAs. > > + * 0b0001 VMSAv8-64 supports 52-bit VAs when using the 64KB > > + * page size. The other translation granules support > > + * 48-bit VAs. > > + * > > + * See ARMv8 ARM for more details. > > + */ > > + if (!(getauxval(AT_HWCAP) & HWCAP_CPUID)) { > > + ERRMSG("arm64 CPUID registers unavailable.\n"); > > + return ERROR; > > + } > > + > > + val = read_id_aa64mmfr2_el1(); > > + val = (val & ID_AA64MMFR2_LVA_MASK) > ID_AA64MMFR2_LVA_SHIFT; > > + > > + if ((val == 0x1) && (PAGESIZE() == SZ_64K)) > > + l_vabits_actual = 52; > > + else > > + l_vabits_actual = 48; > > + > > + return l_vabits_actual; > > +} > > + > > +static void > > +get_page_offset_arm64(void) > > +{ > > + /* Check if 'vabits_actual' is initialized yet. > > + * If not, our best bet is to read ID_AA64MMFR2_EL1 CPU-ID > > + * register. > > + */ > > + if (!vabits_actual) { > > + vabits_actual = get_vabits_actual_from_id_aa64mmfr2_el1(); > > + if ((vabits_actual == ERROR) || (vabits_actual != 52)) { > > + /* If we cannot read ID_AA64MMFR2_EL1 arch > > + * register or if this register does not indicate > > + * support for a larger virtual address, our last > > + * option is to use the VA_BITS to calculate the > > + * PAGE_OFFSET value, i.e. vabits_actual = VA_BITS. > > + */ > > + vabits_actual = va_bits; > > + DEBUG_MSG("vabits_actual : %d (approximation via va_bits)\n", > > + vabits_actual); > > + } else > > + DEBUG_MSG("vabits_actual : %d (via id_aa64mmfr2_el1)\n", > > + vabits_actual); > > + } > > + > > + if (!populate_kernel_version()) { > > + ERRMSG("Cannot get information about current kernel\n"); > > + return; > > + } > > + > > + /* See arch/arm64/include/asm/memory.h for more details of > > + * the PAGE_OFFSET calculation. > > + */ > > + if (info->kernel_version < KERNEL_VERSION(5, 4, 0)) > > + info->page_offset = ((0xffffffffffffffffUL) - > > + ((1UL) << (vabits_actual - 1)) + 1); > > + else > > + info->page_offset = (-(1UL << vabits_actual)); > > + Considering the following related commit order b6d00d47e81a arm64: mm: Introduce 52-bit Kernel VAs (2) ce3aaed87344 arm64: mm: Modify calculation of VMEMMAP_SIZE c8b6d2ccf9b1 arm64: mm: Separate out vmemmap c812026c54cf arm64: mm: Logic to make offset_ttbr1 conditional 5383cc6efed1 arm64: mm: Introduce vabits_actual 90ec95cda91a arm64: mm: Introduce VA_BITS_MIN 99426e5e8c9f arm64: dump: De-constify VA_START and KASAN_SHADOW_START 6bd1d0be0e97 arm64: kasan: Switch to using KASAN_SHADOW_OFFSET 14c127c957c1 arm64: mm: Flip kernel VA space (1) And #define _PAGE_END(va) (-(UL(1) << ((va) - 1))) #define PAGE_OFFSET (((0xffffffffffffffffUL) - ((1UL) << (vabits_actual - 1)) + 1)) //old #define PAGE_OFFSET (-(1UL << vabits_actual)) //new before (1), SYMBOL(_text) < PAGE_OFFSET, afterward, SYMBOL(_text) > PAGE_END == "old PAGE_OFFSET" So the comparasion of kernel version can be replaced by if SYMBOL(_text) > PAGE_END info->page_offset = new PAGE_OFFSET else info->page_offset = old PAGE_OFFSET Any comment? Thanks, Pingfan _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec