Re: [PATCH v5 10/23] KVM: arm/arm64: Move HYP IO VAs to the "idmap" range

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[+Kristina for the extended idmap stuff]

Hi James,

On 09/03/18 18:59, James Morse wrote:
> Hi Marc,
> 
> On 01/03/18 15:55, Marc Zyngier wrote:
>> We so far mapped our HYP IO (which is essencially the GICv2 control
> 
> (Nit: essentially)
> 
> 
>> registers) using the same method as for memory. It recently appeared
>> that is a bit unsafe:
>>
>> We compute the HYP VA using the kern_hyp_va helper, but that helper
>> is only designed to deal with kernel VAs coming from the linear map,
>> and not from the vmalloc region... This could in turn cause some bad
>> aliasing between the two, amplified by the upcoming VA randomisation.
>>
>> A solution is to come up with our very own basic VA allocator for
>> MMIO. Since half of the HYP address space only contains a single
>> page (the idmap), we have plenty to borrow from. Let's use the idmap
>> as a base, and allocate downwards from it. GICv2 now lives on the
>> other side of the great VA barrier.
> 
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 0e5cfffb4c21..3074544940dc 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -502,27 +505,31 @@ static void unmap_hyp_range(pgd_t *pgdp, phys_addr_t start, u64 size)
>>   *
>>   * Assumes hyp_pgd is a page table used strictly in Hyp-mode and
>>   * therefore contains either mappings in the kernel memory area (above
>> - * PAGE_OFFSET), or device mappings in the vmalloc range (from
>> - * VMALLOC_START to VMALLOC_END).
>> + * PAGE_OFFSET), or device mappings in the idmap range.
>>   *
>> - * boot_hyp_pgd should only map two pages for the init code.
>> + * boot_hyp_pgd should only map the idmap range, and is only used in
>> + * the extended idmap case.
>>   */
>>  void free_hyp_pgds(void)
>>  {
>> +	pgd_t *id_pgd;
>> +
>>  	mutex_lock(&kvm_hyp_pgd_mutex);
>>  
>> +	id_pgd = boot_hyp_pgd ? boot_hyp_pgd : hyp_pgd;
> 
>> +
>> +	if (id_pgd)
>> +		unmap_hyp_range(id_pgd, io_map_base,
>> +				hyp_idmap_start + PAGE_SIZE - io_map_base);
> 
> Even if kvm_mmu_init() fails before it sets io_map_base, this will still unmap
> the idmap. It just starts from 0, so it may take out the flipped PAGE_OFFSET
> range too...

Yup, definitely worth fixing.

> 
> (using io_map_base without taking io_map_lock makes me nervous ... in practice,
> its fine)

I'm not too worried about that, as we only have a single CPU performing
the teardown. But better safe than sorry, so I'll take it anyway.

> 
> 
>> +
>>  	if (boot_hyp_pgd) {
>> -		unmap_hyp_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE);
>>  		free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order);
>>  		boot_hyp_pgd = NULL;
>>  	}
>>  
>>  	if (hyp_pgd) {
>> -		unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE);
>>  		unmap_hyp_range(hyp_pgd, kern_hyp_va(PAGE_OFFSET),
>>  				(uintptr_t)high_memory - PAGE_OFFSET);
>> -		unmap_hyp_range(hyp_pgd, kern_hyp_va(VMALLOC_START),
>> -				VMALLOC_END - VMALLOC_START);
>>  
>>  		free_pages((unsigned long)hyp_pgd, hyp_pgd_order);
>>  		hyp_pgd = NULL;
>> @@ -719,7 +726,8 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
>>  			   void __iomem **kaddr,
>>  			   void __iomem **haddr)
>>  {
>> -	unsigned long start, end;
>> +	pgd_t *pgd = hyp_pgd;
>> +	unsigned long base;
>>  	int ret;
>>  
>>  	*kaddr = ioremap(phys_addr, size);
>> @@ -731,11 +739,42 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
>>  		return 0;
>>  	}
>>  
>> +	mutex_lock(&io_map_lock);
>> +
>> +	/*
>> +	 * This assumes that we we have enough space below the idmap
>> +	 * page to allocate our VAs. If not, the check below will
>> +	 * kick. A potential alternative would be to detect that
>> +	 * overflow and switch to an allocation above the idmap.
>> +	 */
>> +	size = max(PAGE_SIZE, roundup_pow_of_two(size));
>> +	base = io_map_base - size;
>> +	base &= ~(size - 1);
>>  
>> -	start = kern_hyp_va((unsigned long)*kaddr);
>> -	end = kern_hyp_va((unsigned long)*kaddr + size);
>> -	ret = __create_hyp_mappings(hyp_pgd, PTRS_PER_PGD, start, end,
>> -				     __phys_to_pfn(phys_addr), PAGE_HYP_DEVICE);
>> +	/*
>> +	 * Verify that BIT(VA_BITS - 1) hasn't been flipped by
>> +	 * allocating the new area, as it would indicate we've
>> +	 * overflowed the idmap/IO address range.
>> +	 */
>> +	if ((base ^ io_map_base) & BIT(VA_BITS - 1)) {
>> +		ret = -ENOMEM;
>> +		goto out;
>> +	}
>> +
>> +	if (__kvm_cpu_uses_extended_idmap())
>> +		pgd = boot_hyp_pgd;
>> +
>> +	ret = __create_hyp_mappings(pgd, __kvm_idmap_ptrs_per_pgd(),
>> +				    base, base + size,
>> +				    __phys_to_pfn(phys_addr), PAGE_HYP_DEVICE);
> 
> (/me winces, that's subtle....)
> This __kvm_idmap_ptrs_per_pgd() change is because the hyp_pgd
> extended-idmap top-level page may be a pgd bigger than the 64-entries that linux
> believes it has for 64K/48bit VA?

Yes, together with the 52bit PA madness.

> Doesn't unmap_hyp_range() need to know about this too? Otherwise its
> pgd_index(hyp_idmap_start) is going to mask out too much of the address, and
> pgd_addr_end() will never reach the end address we provided...

Hmmm, that's a good point. Kristina, what do you think?

> Trying to boot a 64K config, and forcing it into teardown_hyp_mode() leads to
> some fireworks: It looks like an unaligned end address is getting into
> unmap_hyp_ptes() and its escaping the loop to kvm_set_pte() other kernel data...
> 
> My local changes are below [0], the config is defconfig + 64K pages, this is on
> Juno. 4K pages is quite happy with this 'force teardown_hyp_mode()' debug hack.
> 
> Bisects to patch 4: "arm64: KVM: Dynamically patch the kernel/hyp VA mask"
> 
> I'll keep digging on Monday,

Nice one! I've reproduced it on a model. It turns out that we only
mandate a 4kB alignment for the idmap, whilst we assume page alignment
everywhere in the code... Not great. I've fixed it as such:

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 7059c4f34c37..ae4eabd2005d 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1891,7 +1891,9 @@ int kvm_mmu_init(void)
 	int err;

 	hyp_idmap_start = kvm_virt_to_phys(__hyp_idmap_text_start);
+	hyp_idmap_start = ALIGN_DOWN(hyp_idmap_start, PAGE_SIZE);
 	hyp_idmap_end = kvm_virt_to_phys(__hyp_idmap_text_end);
+	hyp_idmap_end = ALIGN(hyp_idmap_end, PAGE_SIZE);
 	hyp_idmap_vector = kvm_virt_to_phys(__kvm_hyp_init);

 	/*

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux