On Thu, Aug 22, 2024 at 03:11:16AM +0800, Zenghui Yu wrote: > > + > > + if (nested) { > > + unsigned long max_map_size; > > + > > + max_map_size = force_pte ? PAGE_SIZE : PUD_SIZE; > > + > > + ipa = kvm_s2_trans_output(nested); > > + > > + /* > > + * If we're about to create a shadow stage 2 entry, then we > > + * can only create a block mapping if the guest stage 2 page > > + * table uses at least as big a mapping. > > + */ > > + max_map_size = min(kvm_s2_trans_size(nested), max_map_size); > > + > > + /* > > + * Be careful that if the mapping size falls between > > + * two host sizes, take the smallest of the two. > > + */ > > + if (max_map_size >= PMD_SIZE && max_map_size < PUD_SIZE) > > + max_map_size = PMD_SIZE; > > + else if (max_map_size >= PAGE_SIZE && max_map_size < PMD_SIZE) > > + max_map_size = PAGE_SIZE; > > + > > + force_pte = (max_map_size == PAGE_SIZE); > > + vma_pagesize = min(vma_pagesize, (long)max_map_size); > > + } > > + > > if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) > > fault_ipa &= ~(vma_pagesize - 1); > > > > - gfn = fault_ipa >> PAGE_SHIFT; > > + gfn = ipa >> PAGE_SHIFT; > > I had seen a non-nested guest boot failure (with vma_pagesize == > PUD_SIZE) and bisection led me here. > > Is it intentional to ignore the fault_ipa adjustment when calculating > gfn if the guest memory is backed by hugetlbfs? This looks broken for > the non-nested case. > > But since I haven't looked at user_mem_abort() for a long time, I'm not > sure if I'd missed something... Nope, you're spot on as usual. Seems like we'd want to make sure both the canonical IPA and fault IPA are hugepage-aligned to get the right PFN and map it at the right place. I repro'ed the boot failure, the following diff gets me back in business. I was _just_ about to send the second batch of fixes, but this is a rather smelly one. Unless someone screams, this is getting stuffed on top. diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 6981b1bc0946..a509b63bd4dd 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1540,8 +1540,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, vma_pagesize = min(vma_pagesize, (long)max_map_size); } - if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) + /* + * Both the canonical IPA and fault IPA must be hugepage-aligned to + * ensure we find the right PFN and lay down the mapping in the right + * place. + */ + if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) { fault_ipa &= ~(vma_pagesize - 1); + ipa &= ~(vma_pagesize - 1); + } gfn = ipa >> PAGE_SHIFT; mte_allowed = kvm_vma_mte_allowed(vma); -- Thanks, Oliver