Hi Marc, On 9/2/20 1:04 PM, Marc Zyngier wrote: > On 2020-09-02 12:53, Alexandru Elisei wrote: >> Hi Marc, >> >> On 9/2/20 12:10 PM, Marc Zyngier wrote: >>> On 2020-09-02 11:59, Alexandru Elisei wrote: >>>> Hi, >>>> >>>> On 8/22/20 3:44 AM, Gavin Shan wrote: >>>>> Depending on the kernel configuration, PUD_SIZE could be equal to >>>>> PMD_SIZE. For example, both of them are 512MB with the following >>>>> kernel configuration. In this case, both PUD and PMD are folded >>>>> to PGD. >>>>> >>>>> CONFIG_ARM64_64K_PAGES y >>>>> CONFIG_ARM64_VA_BITS 42 >>>>> CONFIG_PGTABLE_LEVELS 2 >>>>> >>>>> With the above configuration, the stage2 PUD is used to backup the >>>>> 512MB huge page when the stage2 mapping is built. During the mapping, >>>>> the PUD and its subordinate levels of page table entries are unmapped >>>>> if the PUD is present and not huge page sensitive in stage2_set_pud_huge(). >>>>> Unfornately, the @addr isn't aligned to S2_PUD_SIZE and wrong page table >>>>> entries are zapped. It eventually leads to PUD's present bit can't be >>>>> cleared successfully and infinite loop in stage2_set_pud_huge(). >>>>> >>>>> This fixes the issue by checking with S2_{PUD, PMD}_SIZE instead of >>>>> {PUD, PMD}_SIZE to determine if stage2 PUD or PMD is used to back the >>>>> huge page. For this particular case, the stage2 PMD entry should be >>>>> used to backup the 512MB huge page with stage2_set_pmd_huge(). >>>> >>>> I can reproduce this on my rockpro64 using kvmtool. >>>> >>>> I see two issues here: first, PUD_SIZE = 512MB, but S2_PUD_SIZE = 4TB (checked >>>> using printk), and second, stage2_set_pud_huge() hangs. I'm working on >>>> debugging them. >>> >>> I have this as an immediate fix for the set_pud_huge hang, tested >>> on Seattle with 64k/42bits. >>> >>> I can't wait to see the back of this code... >> >> The problem is in stage2_set_pud_huge(), because kvm_stage2_has_pmd() returns >> false (CONFIG_PGTABLE_LEVELS = 2): >> >> pudp = stage2_get_pud(mmu, cache, addr); >> VM_BUG_ON(!pudp); >> >> old_pud = *pudp; >> >> [..] >> >> // Returns 1 because !kvm_stage2_has_pmd() >> if (stage2_pud_present(kvm, old_pud)) { >> /* >> * If we already have table level mapping for this block, unmap >> * the range for this block and retry. >> */ >> if (!stage2_pud_huge(kvm, old_pud)) { // Always true because >> !kvm_stage2_has_pmd() >> unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE); >> goto retry; >> } >> >> And we end up jumping back to retry forever. IMO, in user_mem_abort(), >> if PUD_SIZE >> == PMD_SIZE, we should try to map PMD_SIZE instead of PUD_SIZE. Maybe something >> like this? > > Err... If PUD_SIZE == PMD_SIZE, what difference does it make to map > one or the other? > >> >> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >> index ba00bcc0c884..178267dec511 100644 >> --- a/arch/arm64/kvm/mmu.c >> +++ b/arch/arm64/kvm/mmu.c >> @@ -1886,8 +1886,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> * As for PUD huge maps, we must make sure that we have at least >> * 3 levels, i.e, PMD is not folded. >> */ >> - if (vma_pagesize == PMD_SIZE || >> - (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) >> + if (vma_pagesize == PUD_SIZE && !kvm_stage2_has_pmd(kvm)) >> + vma_pagesize = PMD_SIZE; >> + >> + if (vma_pagesize == PUD_SIZE || vma_pagesize == PUD_SIZE) >> gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> >> PAGE_SHIFT; >> mmap_read_unlock(current->mm); > > I don't think this solves anything in the 2 level case. The gist of > the issue is that if we go on the PUD path, we end-up computing the > wrong offset for the entry and either loop or do something silly. > In either case, we read/write something we have no business touching. In my testing, I seem to be getting the right offset. For fault_ipa 0x8000 0000, in stage2_set_pud_huge I get pudp = mmu->pgd + 0x20, which looks like the correct address for the table entry (IPA[41:29] = 4). That address is valid because in kvm_init_stage2_mmu we allocate stage2_pgd_size = 16K (40 bits IPA so we only need to map bits [39:29] in the level 2 table). The reason why I am seeing the hang is what I tried to explained above: inkvm_stage2_has_pmd() is always false, which means that in stage2_set_pud_huge, stage2_pud_present is always true and stage2_pud_huge is always false, which leads to jumping to the retry label in an infinite loop. > > Gavin sidesteps the problem by using a fantasy PUD size, while I > catch it by explicitly checking for the PMD == PUD case. My mistake, I shouldn't have posted a diff without testing it. You're right, my change fixes nothing because vma_pagesize still equals PUD_SIZE == PMD_SIZE. I tested your patch and the hang goes away. If you decide to keep your patch as-is: Tested-by: Alexandru Elisei <alexandru.elisei@xxxxxxx> Reviewed-by: Alexandru Elisei <alexandru.elisei@xxxxxxx> As an aside, a comment explaining why the check for PUD_SIZE != PMD_SIZE could make it easier for other people to understand what is going on. Thanks, Alex _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm