Re: [PATCH] makedumpfile/arm64: Add support for ARMv8.2-LPA (52-bit PA support)

Bhupesh Sharma <bhsharma@xxxxxxxxxx> · Wed, 13 Feb 2019 00:52:55 +0530




Hi Kazu,

On 02/12/2019 10:45 PM, Kazuhito Hagio wrote:
Hi Bhupesh,

On 2/12/2019 3:44 AM, Bhupesh Sharma wrote:
+/* See 'arch/arm64/include/asm/pgtable-hwdef.h' for definitions below */
+
+/*
+ * Number of page-table levels required to address 'va_bits' wide
+ * address, without section mapping. We resolve the top (va_bits - PAGE_SHIFT)
+ * bits with (PAGE_SHIFT - 3) bits at each page table level. Hence:
+ *
+ *  levels = DIV_ROUND_UP((va_bits - PAGE_SHIFT), (PAGE_SHIFT - 3))
+ *
+ * where DIV_ROUND_UP(n, d) => (((n) + (d) - 1) / (d))
+ *
+ * We cannot include linux/kernel.h which defines DIV_ROUND_UP here
+ * due to build issues. So we open code DIV_ROUND_UP here:
+ *
+ *     ((((va_bits) - PAGE_SHIFT) + (PAGE_SHIFT - 3) - 1) / (PAGE_SHIFT - 3))
+ *
+ * which gets simplified as :
+ */
+#define ARM64_HW_PGTABLE_LEVELS(va_bits) (((va_bits) - 4) / (PAGE_SHIFT - 3))

Is this needed?

Yes, it is needed for both the LPA and LVA patches (the LVA patch was sent out separately), since we need to calculate values like PMD_SHIFT on basis of the page-table levels.

Also this makes the makedumpfile code confirm more to the recent kernel definitions as otherwise one needs to read the makedumpfile code and dig kernel change history to understand the calculation of these macros.

In future also, I plan to keep these values in sync with the kernel and send patches accordingly.

I agree to sync these macros with the kernel, but currently this one
is not used in your patches, and you wrote the path of the source file
(pgtable-hwdef.h) above for reference, so we don't need to import this
one for now. I'd like to import it when it is needed.

Ok, now I understand. You mean 'ARM64_HW_PGTABLE_LEVELS' macro here only 
and not 'ARM64_HW_PGTABLE_LEVEL_SHIFT' macro also, right? If yes, I 
agree to removing the former. Will fix this in v2.

+/* Highest possible physical address supported */
+static inline int
+get_phy_mask_shift_arm64(void)
+{
+       int pa_bits = 48 /* default: 48-bit PA */;
+
+       if (NUMBER(MAX_PHYSMEM_BITS) != NOT_FOUND_NUMBER)
+               pa_bits = NUMBER(MAX_PHYSMEM_BITS);
+
+       return pa_bits;
+}

Is this needed to be an inline function?

IMO, its better to keep this as inline.

Once you modify the part of setting info->max_physmem_bits below,
we can remove this function, because we don't use PHYS_MASK_SHIFT
anymore.

Ok.

-#define PMD_TYPE_MASK          3
-#define PMD_TYPE_SECT          1
-#define PMD_TYPE_TABLE         3
+#define PHYS_MASK_SHIFT                get_phy_mask_shift_arm64()

-#define PUD_TYPE_MASK          3
-#define PUD_TYPE_SECT          1
-#define PUD_TYPE_TABLE         3
+/* Helper API to convert between a physical address and its placement
+ * in a page table entry, taking care of 52-bit addresses.
+ */
+static inline unsigned long
+__pte_to_phys(pte_t pte)
+{
+       if (lpa_52_bit_support_available)

OK.

According to the related emails, it looks like "PAGESIZE() == SZ_64K"
is also usable here, but this is the same condition as kernel's one
and easy to understand.

No, like I clarified to the upstream maintainers, distributions like Fedora already support a default page size of 64K, but the PA address space can be 48 or 52, depending on the kernel version and kernel CONFIG flags used.

My understanding is that with 64k page, we can convert a page table
entry to a physicall address without awareness of 52-bit.

According to this patch, the top 4 bits of a 52-bit physical address
are positioned at bits 12..15 of a page table entry.

commit 75387b92635e7dca410c1ef92cfe510019248f76
Author: Kristina Martsenko <kristina.martsenko@xxxxxxx>
Date:   Wed Dec 13 17:07:21 2017 +0000

     arm64: handle 52-bit physical addresses in page table entries
     
     The top 4 bits of a 52-bit physical address are positioned at bits
     12..15 of a page table entry. Introduce macros to convert between a
     physical address and its placement in a table entry, and change all
     macros/functions that access PTEs to use them.

With 64k page and non-52-bit kernel, it looks like the bits 12..15
are zero, so we can move the zeros to bits 49..51 because the zeros
don't affect the PA, for example:

       52-bit              non-52-bit (48-bit)
   PTE 0x0000123456789000  0x0000123456780000
            v--------^          v--------^   __pte_to_phys() w/52-bit support
   PA  0x0009123456780000  0x0000123456780000

I think that this was what the upstream maintainers said..
But "if (lpa_52_bit_support_available)" is also fine for me.

Well from my experience on arm32 and arm64 hardware enablement, assuming 
values of implementation defined fields for arm hardware can be risky :)

Lets see what the ARMv8 architecture reference manual says about the 
Bits [15:12] for a 64KB page size:

"Bits[15:12] of each valid translation table descriptor hold bits[51:48] 
of the output address, or of the address of the translation table to be 
used for the initial lookup at the next level of translation. If the 
implementation does not support 52-bit physical addresses, then it is 
IMPLEMENTATION DEFINED whether non-zero values for these bits generate 
an Address size fault."

So, it is unsafe to assume that for a 48-bit physical address 
implementation the Bits[15:12] cannot have non-zero values on certain 
hardware implementations. So assuming that these bits are always 0 and 
can be easily moved to Bits[51:48] for a 64K page and non-52bit kernel, 
can lead to IMPLEMENTATION DEFINED behavior.

Its better instead to have a predictable behavior in such cases and by 
explicitly calculating the paddress values using the right PTE High and 
Low masks in these cases we can minimize any hardware IMPLEMENTATION 
specific details (just like the handling done in kernel space).

@@ -287,6 +481,15 @@ get_stext_symbol(void)
   int
   get_machdep_info_arm64(void)
   {
+       int pa_bits;
+
+       /* Determine if the PA address range is 52-bits: ARMv8.2-LPA */
+       pa_bits = get_phy_mask_shift_arm64();
+       DEBUG_MSG("pa_bits    : %d\n", pa_bits);
+
+       if (pa_bits == 52)
+               lpa_52_bit_support_available = 1;
+

This looks a bit redundant, so for example

    if (NUMBER(MAX_PHYSMEM_BITS) != NOT_FOUND_NUMBER) {
      info->max_physmem_bits = NUMBER(MAX_PHYSMEM_BITS);
      if (info->max_physmem_bits == 52)
        lpa_52_bit_support_available = 1;
    } else
      info->max_physmem_bits = 48;

Ok.


@@ -425,14 +628,13 @@ vaddr_to_paddr_arm64(unsigned long vaddr)
                          ERRMSG("Can't get a valid pte.\n");
                          return NOT_PADDR;
                  } else {
-
-                       paddr = (PAGEBASE(pte_val(ptev)) & PHYS_MASK)
+                       paddr = (PAGEBASE(pte_val(ptev)) & PTE_ADDR_MASK)
                                          + (vaddr & (PAGESIZE() - 1));

I think __pte_to_phys() is needed also here, not PTE_ADDR_MASK.

I had some issues with the same on ARMv8 simulator, so lets stick with the tested 'PTE_ADDR_MASK' usage for now.

Did you test this PTE_ADDR_MASK line on a system actually using
52-bit PA? If a 52-bit physical address is actually used, this will
return a wrong address, for example:

   PTE_ADDR_MASK  0x0000fffffffff000
   PTE            0x0000123456789000
   PAGEBASE'd     0x0000123456780000
    v
   paddr          0x0000123456780000 + 64k offset // incorrect

With 52-bit PA, the PTE_ADDR_MASK is used for __phys_to_pte_val(),
not __pte_to_phys().

If I understand correctly, we should need __pte_to_phys() also here.

   paddr = __pte_to_phys(ptev)
                   + (vaddr & (PAGESIZE() - 1));

Could you try this?

Yes, the ARMv8 simulator can support both ARMv8.2 LPA and LVA features 
and I tested the above suggestion earlier also on the same and landed 
into incorrect paddress calculation issues.

Since the simulator is slow and its time taking to insert rebuilt 
user-space components in the Linaro initramfs being used on the same, I 
would suggest that we go ahead with the above code for now and later 
when I have more results from the simulator/real hardware, I will send 
out a follow-up patch (if needed) to fix the paddress calculation.

Thanks,
Bhupesh


_______________________________________________
kexec mailing list
kexec@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/kexec