[+Kristina for the extended idmap stuff] Hi James, On 09/03/18 18:59, James Morse wrote: > Hi Marc, > > On 01/03/18 15:55, Marc Zyngier wrote: >> We so far mapped our HYP IO (which is essencially the GICv2 control > > (Nit: essentially) > > >> registers) using the same method as for memory. It recently appeared >> that is a bit unsafe: >> >> We compute the HYP VA using the kern_hyp_va helper, but that helper >> is only designed to deal with kernel VAs coming from the linear map, >> and not from the vmalloc region... This could in turn cause some bad >> aliasing between the two, amplified by the upcoming VA randomisation. >> >> A solution is to come up with our very own basic VA allocator for >> MMIO. Since half of the HYP address space only contains a single >> page (the idmap), we have plenty to borrow from. Let's use the idmap >> as a base, and allocate downwards from it. GICv2 now lives on the >> other side of the great VA barrier. > >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 0e5cfffb4c21..3074544940dc 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -502,27 +505,31 @@ static void unmap_hyp_range(pgd_t *pgdp, phys_addr_t start, u64 size) >> * >> * Assumes hyp_pgd is a page table used strictly in Hyp-mode and >> * therefore contains either mappings in the kernel memory area (above >> - * PAGE_OFFSET), or device mappings in the vmalloc range (from >> - * VMALLOC_START to VMALLOC_END). >> + * PAGE_OFFSET), or device mappings in the idmap range. >> * >> - * boot_hyp_pgd should only map two pages for the init code. >> + * boot_hyp_pgd should only map the idmap range, and is only used in >> + * the extended idmap case. >> */ >> void free_hyp_pgds(void) >> { >> + pgd_t *id_pgd; >> + >> mutex_lock(&kvm_hyp_pgd_mutex); >> >> + id_pgd = boot_hyp_pgd ? boot_hyp_pgd : hyp_pgd; > >> + >> + if (id_pgd) >> + unmap_hyp_range(id_pgd, io_map_base, >> + hyp_idmap_start + PAGE_SIZE - io_map_base); > > Even if kvm_mmu_init() fails before it sets io_map_base, this will still unmap > the idmap. It just starts from 0, so it may take out the flipped PAGE_OFFSET > range too... Yup, definitely worth fixing. > > (using io_map_base without taking io_map_lock makes me nervous ... in practice, > its fine) I'm not too worried about that, as we only have a single CPU performing the teardown. But better safe than sorry, so I'll take it anyway. > > >> + >> if (boot_hyp_pgd) { >> - unmap_hyp_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE); >> free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order); >> boot_hyp_pgd = NULL; >> } >> >> if (hyp_pgd) { >> - unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE); >> unmap_hyp_range(hyp_pgd, kern_hyp_va(PAGE_OFFSET), >> (uintptr_t)high_memory - PAGE_OFFSET); >> - unmap_hyp_range(hyp_pgd, kern_hyp_va(VMALLOC_START), >> - VMALLOC_END - VMALLOC_START); >> >> free_pages((unsigned long)hyp_pgd, hyp_pgd_order); >> hyp_pgd = NULL; >> @@ -719,7 +726,8 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size, >> void __iomem **kaddr, >> void __iomem **haddr) >> { >> - unsigned long start, end; >> + pgd_t *pgd = hyp_pgd; >> + unsigned long base; >> int ret; >> >> *kaddr = ioremap(phys_addr, size); >> @@ -731,11 +739,42 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size, >> return 0; >> } >> >> + mutex_lock(&io_map_lock); >> + >> + /* >> + * This assumes that we we have enough space below the idmap >> + * page to allocate our VAs. If not, the check below will >> + * kick. A potential alternative would be to detect that >> + * overflow and switch to an allocation above the idmap. >> + */ >> + size = max(PAGE_SIZE, roundup_pow_of_two(size)); >> + base = io_map_base - size; >> + base &= ~(size - 1); >> >> - start = kern_hyp_va((unsigned long)*kaddr); >> - end = kern_hyp_va((unsigned long)*kaddr + size); >> - ret = __create_hyp_mappings(hyp_pgd, PTRS_PER_PGD, start, end, >> - __phys_to_pfn(phys_addr), PAGE_HYP_DEVICE); >> + /* >> + * Verify that BIT(VA_BITS - 1) hasn't been flipped by >> + * allocating the new area, as it would indicate we've >> + * overflowed the idmap/IO address range. >> + */ >> + if ((base ^ io_map_base) & BIT(VA_BITS - 1)) { >> + ret = -ENOMEM; >> + goto out; >> + } >> + >> + if (__kvm_cpu_uses_extended_idmap()) >> + pgd = boot_hyp_pgd; >> + >> + ret = __create_hyp_mappings(pgd, __kvm_idmap_ptrs_per_pgd(), >> + base, base + size, >> + __phys_to_pfn(phys_addr), PAGE_HYP_DEVICE); > > (/me winces, that's subtle....) > This __kvm_idmap_ptrs_per_pgd() change is because the hyp_pgd > extended-idmap top-level page may be a pgd bigger than the 64-entries that linux > believes it has for 64K/48bit VA? Yes, together with the 52bit PA madness. > Doesn't unmap_hyp_range() need to know about this too? Otherwise its > pgd_index(hyp_idmap_start) is going to mask out too much of the address, and > pgd_addr_end() will never reach the end address we provided... Hmmm, that's a good point. Kristina, what do you think? > Trying to boot a 64K config, and forcing it into teardown_hyp_mode() leads to > some fireworks: It looks like an unaligned end address is getting into > unmap_hyp_ptes() and its escaping the loop to kvm_set_pte() other kernel data... > > My local changes are below [0], the config is defconfig + 64K pages, this is on > Juno. 4K pages is quite happy with this 'force teardown_hyp_mode()' debug hack. > > Bisects to patch 4: "arm64: KVM: Dynamically patch the kernel/hyp VA mask" > > I'll keep digging on Monday, Nice one! I've reproduced it on a model. It turns out that we only mandate a 4kB alignment for the idmap, whilst we assume page alignment everywhere in the code... Not great. I've fixed it as such: diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 7059c4f34c37..ae4eabd2005d 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1891,7 +1891,9 @@ int kvm_mmu_init(void) int err; hyp_idmap_start = kvm_virt_to_phys(__hyp_idmap_text_start); + hyp_idmap_start = ALIGN_DOWN(hyp_idmap_start, PAGE_SIZE); hyp_idmap_end = kvm_virt_to_phys(__hyp_idmap_text_end); + hyp_idmap_end = ALIGN(hyp_idmap_end, PAGE_SIZE); hyp_idmap_vector = kvm_virt_to_phys(__kvm_hyp_init); /* Thanks, M. -- Jazz is not dead. It just smells funny...