Re: [Query] PAGE_OFFSET on KASLR enabled ARM64 kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/31/2018 10:21 AM, Bhupesh Sharma wrote:
Hi Ard,

Sorry I was out for most of the day yesterday. Please see my responses inline.

On Mon, May 28, 2018 at 12:16 PM, Ard Biesheuvel
<ard.biesheuvel@xxxxxxxxxx> wrote:
On 27 May 2018 at 23:03, Bhupesh Sharma <bhsharma@xxxxxxxxxx> wrote:
Hi ARM64 maintainers,

I am confused about the PAGE_OFFSET value (or the start of the linear
map) on a KASLR enabled ARM64 kernel that I am seeing on a board which
supports a compatible EFI firmware (with EFI_RNG_PROTOCOL support).

1. 'arch/arm64/include/asm/memory.h' defines PAGE_OFFSET as:

/*
  * PAGE_OFFSET - the virtual address of the start of the linear map (top
  *         (VA_BITS - 1))
  */
#define PAGE_OFFSET        (UL(0xffffffffffffffff) - \
     (UL(1) << (VA_BITS - 1)) + 1)

So for example on a platform with VA_BITS=48, we have:
PAGE_OFFSET = 0xffff800000000000

2. However, for the KASLR case, we set the 'memstart_offset_seed ' to
use the 16-bits of the 'kaslr-seed' to randomize the linear region in
'arch/arm64/kernel/kaslr.c' :

u64 __init kaslr_early_init(u64 dt_phys)
{
<snip..>
     /* use the top 16 bits to randomize the linear region */
     memstart_offset_seed = seed >> 48;
<snip..>
}

3. Now, we use the 'memstart_offset_seed' value to randomize the
'memstart_addr' value in 'arch/arm64/mm/init.c':

void __init arm64_memblock_init(void)
{
<snip..>

     if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
         extern u16 memstart_offset_seed;
         u64 range = linear_region_size -
                 (memblock_end_of_DRAM() - memblock_start_of_DRAM());

         /*
          * If the size of the linear region exceeds, by a sufficient
          * margin, the size of the region that the available physical
          * memory spans, randomize the linear region as well.
          */
         if (memstart_offset_seed > 0 && range >= ARM64_MEMSTART_ALIGN) {
             range = range / ARM64_MEMSTART_ALIGN + 1;
             memstart_addr -= ARM64_MEMSTART_ALIGN *
                      ((range * memstart_offset_seed) >> 16);
         }
     }
<snip..>
}

4. Since 'memstart_addr' indicates the start of physical RAM, we
randomize the same on basis of 'memstart_offset_seed' value above.
Also the 'memstart_addr' value is available in '/proc/kallsyms' and
hence can be accessed by user-space applications to read the
'memstart_addr' value.

5. Now since the PAGE_OFFSET value is also used by several user space
tools (for e.g. makedumpfile tool uses the same to determine the start
of linear region and hence to read PT_NOTE fields from /proc/kcore), I
am not sure how to read the randomized value of the same in the KASLR
enabled case.

6. Reading the code further and adding some debug prints, it seems the
'memblock_start_of_DRAM()' value is more closer to the actual start of
linear region rather than 'memstart_addr' and 'PAGE_OFFSET" in case of
KASLR enabled kernel:

[root@qualcomm-amberwing] # dmesg | grep -i "arm64_memblock_init" -A 5

[    0.000000] inside arm64_memblock_init, memstart_addr = ffff976a00000000,
linearstart_addr = ffffe89600200000, memblock_start_of_DRAM = ffffe89600200000,
PHYS_OFFSET = ffff976a00000000, PAGE_OFFSET = ffff800000000000,
KIMAGE_VADDR = ffff000008000000, kimage_vaddr = ffff20c2d7800000

[root@qualcomm-amberwing] # dmesg | grep -i "Virtual kernel memory layout" -A 15
[    0.000000] Virtual kernel memory layout:
[    0.000000]     modules : 0xffff000000000000 - 0xffff000008000000
(   128 MB)
[    0.000000]     vmalloc : 0xffff000008000000 - 0xffff7bdfffff0000
(126847 GB)
[    0.000000]       .text : 0xffff20c2d7880000 - 0xffff20c2d8040000
(  7936 KB)
[    0.000000]     .rodata : 0xffff20c2d8040000 - 0xffff20c2d83a0000
(  3456 KB)
[    0.000000]       .init : 0xffff20c2d83a0000 - 0xffff20c2d8750000
(  3776 KB)
[    0.000000]       .data : 0xffff20c2d8750000 - 0xffff20c2d891b200
(  1837 KB)
[    0.000000]        .bss : 0xffff20c2d891b200 - 0xffff20c2d90a5198
(  7720 KB)
[    0.000000]     fixed   : 0xffff7fdffe790000 - 0xffff7fdffec00000
(  4544 KB)
[    0.000000]     PCI I/O : 0xffff7fdffee00000 - 0xffff7fdfffe00000
(    16 MB)
[    0.000000]     vmemmap : 0xffff7fe000000000 - 0xffff800000000000
(   128 GB maximum)
[    0.000000]               0xffff7ffa25800800 - 0xffff7ffa2b800000
(    95 MB actual)
[    0.000000]     memory  : 0xffffe89600200000 - 0xffffe8ae00000000
( 98302 MB)

As one can see above, the 'memblock_start_of_DRAM()' value of
0xffffe89600200000 represents the start of linear region:

[    0.000000]     memory  : 0xffffe89600200000 - 0xffffe8ae00000000
( 98302 MB)

So, my question is to access the start of linear region (which was
earlier determinable via PAGE_OFFSET macro), whether I should:

- do some back-computation for the start of linear region from the
'memstart_addr' in user-space, or
- use a new global variable in kernel which is assigned the value of
memblock_start_of_DRAM()' and assign it to '/proc/kallsyms', so that
it can be read by user-space tools, or
- whether we should rather look at removing the PAGE_OFFSET usage from
the kernel and replace it with a global variable instead which is
properly updated for KASLR case as well.

Kindly share your opinions on what can be a suitable solution in this case.

Thanks for your help.


Hello Bhupesh,

Could you explain what the relevance is of PAGE_OFFSET to userland?
The only thing that should matter is where the actual linear mapping
of DRAM is, and I am not sure I understand why we care about where it
resides relative to the base of the linear region.

Actually certain user-space tools like makedumpfile (which is used to
generate and compress the vmcore) and crash-utility (which is used to
debug the vmcore), rely on the PAGE_OFFSET value (which denotes the
base of the linear map region) to determine virtual to physical
mapping of the addresses lying in the linear region .

One specific use case that I am working on at the moment is the
makedumpfile '--mem-usage', which allows one to see the page numbers
of current system (1st kernel) in different use (please see
MAKEDUMPFILE(8) for more details).

Using this we can know how many pages are dumpable when different
dump_level is specified when invoking the makedumpfile.

Normally, makedumpfile analyses the contents of '/proc/kcore' (while
excluding the crashkernel range), and then calculates the page number
of different kind per vmcoreinfo.

For e.g. here is an output from my arm64 board (a non KASLR boot):

     TYPE            PAGES                   EXCLUDABLE      DESCRIPTION
     ----------------------------------------------------------------------
     ZERO            49524                   yes             Pages
filled with zero
     NON_PRI_CACHE   15143                   yes             Cache
pages without private flag
     PRI_CACHE       29147                   yes             Cache
pages with private flag
     USER            3684                    yes             User process pages
     FREE            1450569                 yes             Free pages
     KERN_DATA       14243                   no              Dumpable kernel data

     page size:              65536
     Total pages on system:  1562310
     Total size on system:   102387548160     Byte

This use case requires directly reading the '/proc/kcore' and the
hence the PAGE_OFFSET value is used to determine the base address of
the linear region, whose value is not static in case of KASLR boot.

Another use-case is where the crash-utility uses the PAGE_OFFSET value
to perform a virtual-to-physical conversion for the address lying in
the linear region:

ulong
arm64_VTOP(ulong addr)
{
     if (machdep->flags & NEW_VMEMMAP) {
         if (addr >= machdep->machspec->page_offset)
             return machdep->machspec->phys_offset
                 + (addr - machdep->machspec->page_offset);

<..snip..>
}


Another confusing concept is the rounded-up value of 'memstart_addr' in 'arch/arm64/mm/init.c' when booting a non-KASLR_ kernel and when the value of memblock_start_of_DRAM() < ARM64_MEMSTART_ALIGN:

void __init arm64_memblock_init(void)
{

<..snip..>
	/*
	 * Select a suitable value for the base of physical memory.
	 */
	memstart_addr = round_down(memblock_start_of_DRAM(),
				   ARM64_MEMSTART_ALIGN);
<..snip..>
}

For example, let's consider a case (which I see on my qualcomm board) where memblock_start_of_DRAM() = 0x200000 and ARM64_MEMSTART_ALIGN = 0x40000000 (I am using VA_BITS = 48 and a 64K page size), in this case
memstart_addr is calculated at 0, as the round_down results in a value of 0.

This is in contrast with the definition of the 'memblock_start_of_DRAM':

/* lowest address */
phys_addr_t __init_memblock memblock_start_of_DRAM(void)
{
	return memblock.memory.regions[0].base;
}

As indicated by logs below, the first memblock region base starts from 0x200000 rather than the 'memstart_addr' value (which is 0)

# dmesg | grep -i "Processing" -A 5
[    0.000000] efi: Processing EFI memory map:
[ 0.000000] efi: 0x000000200000-0x00000021ffff [Runtime Data |RUN| | | | | | | |WB|WT|WC|UC] [ 0.000000] efi: 0x000000400000-0x0000005fffff [ACPI Memory NVS | | | | | | | | | | | |UC]

# head -1 /proc/iomem
00200000-0021ffff : reserved

Since we define 'PHYS_OFFSET' as the physical address of the start of memory it would be 0 in this case:

/* PHYS_OFFSET - the physical address of the start of memory. */
#define PHYS_OFFSET		({ VM_BUG_ON(memstart_addr & 1); memstart_addr; })

On the other hand, the first memblock starts from 0x200000, so my question is whether we should update the user-space tools which use the memblocks listed in '/proc/iomem' to obtain the value of PHY_OFFSET (by reading the base of the 1st memblock) and read the value of 'memstart_addr' somehow in user-space to get the PHY_OFFSET, or should the change be done at the kernel end to calculate 'memstart_addr' as:


	/*
	 * Select a suitable value for the base of physical memory.
	 */
	memstart_addr = round_down(memblock_start_of_DRAM(),
				   ARM64_MEMSTART_ALIGN);
	if (memstart_addr)
		memstart_addr = memblock_start_of_DRAM();

Please share your views.

Thanks,
Bhupesh

_______________________________________________
kexec mailing list
kexec@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/kexec



[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux