Re: [PATCH v4 14/19] KVM: arm/arm64: Move HYP IO VAs to the "idmap" range

Marc Zyngier <marc.zyngier@xxxxxxx> · Fri, 16 Feb 2018 15:20:32 +0000

On 16/02/18 09:25, Christoffer Dall wrote:
> On Thu, Feb 15, 2018 at 01:52:05PM +0000, Marc Zyngier wrote:
>> On 18/01/18 14:39, Christoffer Dall wrote:
>>> On Thu, Jan 04, 2018 at 06:43:29PM +0000, Marc Zyngier wrote:
>>>> We so far mapped our HYP IO (which is essencially the GICv2 control
>>>> registers) using the same method as for memory. It recently appeared
>>>> that is a bit unsafe:
>>>>
>>>> We compute the HYP VA using the kern_hyp_va helper, but that helper
>>>> is only designed to deal with kernel VAs coming from the linear map,
>>>> and not from the vmalloc region... This could in turn cause some bad
>>>> aliasing between the two, amplified by the upcoming VA randomisation.
>>>>
>>>> A solution is to come up with our very own basic VA allocator for
>>>> MMIO. Since half of the HYP address space only contains a single
>>>> page (the idmap), we have plenty to borrow from. Let's use the idmap
>>>> as a base, and allocate downwards from it. GICv2 now lives on the
>>>> other side of the great VA barrier.
>>>>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
>>>> ---
>>>>  virt/kvm/arm/mmu.c | 56 +++++++++++++++++++++++++++++++++++++++++-------------
>>>>  1 file changed, 43 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>>>> index 6192d45d1e1a..14c5e5534f2f 100644
>>>> --- a/virt/kvm/arm/mmu.c
>>>> +++ b/virt/kvm/arm/mmu.c
>>>> @@ -43,6 +43,9 @@ static unsigned long hyp_idmap_start;
>>>>  static unsigned long hyp_idmap_end;
>>>>  static phys_addr_t hyp_idmap_vector;
>>>>  
>>>> +static DEFINE_MUTEX(io_map_lock);
>>>> +static unsigned long io_map_base;
>>>> +
>>>>  #define S2_PGD_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
>>>>  #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
>>>>  
>>>> @@ -502,27 +505,31 @@ static void unmap_hyp_range(pgd_t *pgdp, phys_addr_t start, u64 size)
>>>>   *
>>>>   * Assumes hyp_pgd is a page table used strictly in Hyp-mode and
>>>>   * therefore contains either mappings in the kernel memory area (above
>>>> - * PAGE_OFFSET), or device mappings in the vmalloc range (from
>>>> - * VMALLOC_START to VMALLOC_END).
>>>> + * PAGE_OFFSET), or device mappings in the idmap range.
>>>>   *
>>>> - * boot_hyp_pgd should only map two pages for the init code.
>>>> + * boot_hyp_pgd should only map the idmap range, and is only used in
>>>> + * the extended idmap case.
>>>>   */
>>>>  void free_hyp_pgds(void)
>>>>  {
>>>> +	pgd_t *id_pgd;
>>>> +
>>>>  	mutex_lock(&kvm_hyp_pgd_mutex);
>>>>  
>>>> +	id_pgd = boot_hyp_pgd ? boot_hyp_pgd : hyp_pgd;
>>>> +
>>>> +	if (id_pgd)
>>>> +		unmap_hyp_range(id_pgd, io_map_base,
>>>> +				hyp_idmap_start + PAGE_SIZE - io_map_base);
>>>> +
>>>>  	if (boot_hyp_pgd) {
>>>> -		unmap_hyp_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE);
>>>>  		free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order);
>>>>  		boot_hyp_pgd = NULL;
>>>>  	}
>>>>  
>>>>  	if (hyp_pgd) {
>>>> -		unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE);
>>>>  		unmap_hyp_range(hyp_pgd, kern_hyp_va(PAGE_OFFSET),
>>>>  				(uintptr_t)high_memory - PAGE_OFFSET);
>>>> -		unmap_hyp_range(hyp_pgd, kern_hyp_va(VMALLOC_START),
>>>> -				VMALLOC_END - VMALLOC_START);
>>>>  
>>>>  		free_pages((unsigned long)hyp_pgd, hyp_pgd_order);
>>>>  		hyp_pgd = NULL;
>>>> @@ -721,7 +728,8 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
>>>>  			   void __iomem **kaddr,
>>>>  			   void __iomem **haddr)
>>>>  {
>>>> -	unsigned long start, end;
>>>> +	pgd_t *pgd = hyp_pgd;
>>>> +	unsigned long base;
>>>>  	int ret;
>>>>  
>>>>  	*kaddr = ioremap(phys_addr, size);
>>>> @@ -733,17 +741,38 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
>>>>  		return 0;
>>>>  	}
>>>>  
>>>> +	mutex_lock(&io_map_lock);
>>>> +
>>>> +	base = io_map_base - size;
>>>
>>> are we guaranteed that hyp_idmap_start (and therefore io_map_base) is
>>> sufficiently greater than 0 ?  I suppose that even if RAM starts at 0,
>>> and the kernel was loaded at 0, the idmap page for Hyp would be at some
>>> reasonable offset from the start of the kernel image?
>>
>> On my kernel image:
>> ffff000008080000 t _head
>> ffff000008cc6000 T __hyp_idmap_text_start
>> ffff000009aaa000 B swapper_pg_end
>>
>> _hyp_idmap_text_start is about 12MB from the beginning of the image, and
>> about 14MB from the end. Yes, it is a big kernel. But we're only mapping
>> a few pages there, even with my upcoming crazy vector remapping crap. So
>> the likeliness of this failing is close to zero.
>>
>> Now, close to zero is not necessarily close enough. What I could do is
>> to switch the allocator around on failure, so that if we can't allocate
>> on one side, we can at least try to allocate on the other side. I'm
>> pretty sure we'll never trigger that code, but I can implement it if you
>> think that's worth it.
>>
> 
> I don't think we should necessarily implement it, my main concern is
> when reading the code, the reader has to concince himself/herself why
> this is always safe (and not just very likely to be safe).  I'm fine
> with adding a comment that explains this instead of implementing
> complicated logic though.  What do you think?

Oh, absolutely. I'll add a blurb about this.

> 
>>>
>>>> +	base &= ~(size - 1);
>>>
>>> I'm not sure I understand this line.  Wouldn't it make more sense to use
>>> PAGE_SIZE here?
>>
>> This is trying to align the base of the allocation to its natural size
>> (8kB aligned on an 8kB boundary, for example), which is what other
>> allocators in the kernel do. I've now added a roundup_pow_of_two(size)
>> so that we're guaranteed to deal with those.
>>
> 
> Ah right, it's just that I wasn't thinking of the size in terms of
> always being page aligned, so couldn't you here attempt two 4K
> allocations on a 64K page system, where one would overwrite the other?

Ah, I see what you mean. Well spotted. I'll turn that into a
max(roundup_pow_of_two(size), PAGE_SIZE).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...