Re: [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly

Alexandru Elisei <alexandru.elisei@xxxxxxx> · Wed, 2 Sep 2020 12:53:27 +0100

Hi Marc,

On 9/2/20 12:10 PM, Marc Zyngier wrote:
> On 2020-09-02 11:59, Alexandru Elisei wrote:
>> Hi,
>>
>> On 8/22/20 3:44 AM, Gavin Shan wrote:
>>> Depending on the kernel configuration, PUD_SIZE could be equal to
>>> PMD_SIZE. For example, both of them are 512MB with the following
>>> kernel configuration. In this case, both PUD and PMD are folded
>>> to PGD.
>>>
>>>    CONFIG_ARM64_64K_PAGES   y
>>>    CONFIG_ARM64_VA_BITS     42
>>>    CONFIG_PGTABLE_LEVELS    2
>>>
>>> With the above configuration, the stage2 PUD is used to backup the
>>> 512MB huge page when the stage2 mapping is built. During the mapping,
>>> the PUD and its subordinate levels of page table entries are unmapped
>>> if the PUD is present and not huge page sensitive in stage2_set_pud_huge().
>>> Unfornately, the @addr isn't aligned to S2_PUD_SIZE and wrong page table
>>> entries are zapped. It eventually leads to PUD's present bit can't be
>>> cleared successfully and infinite loop in stage2_set_pud_huge().
>>>
>>> This fixes the issue by checking with S2_{PUD, PMD}_SIZE instead of
>>> {PUD, PMD}_SIZE to determine if stage2 PUD or PMD is used to back the
>>> huge page. For this particular case, the stage2 PMD entry should be
>>> used to backup the 512MB huge page with stage2_set_pmd_huge().
>>
>> I can reproduce this on my rockpro64 using kvmtool.
>>
>> I see two issues here: first, PUD_SIZE = 512MB, but S2_PUD_SIZE = 4TB (checked
>> using printk), and second, stage2_set_pud_huge() hangs. I'm working on
>> debugging them.
>
> I have this as an immediate fix for the set_pud_huge hang, tested
> on Seattle with 64k/42bits.
>
> I can't wait to see the back of this code...

The problem is in stage2_set_pud_huge(), because kvm_stage2_has_pmd() returns
false (CONFIG_PGTABLE_LEVELS = 2):

    pudp = stage2_get_pud(mmu, cache, addr);
    VM_BUG_ON(!pudp);

    old_pud = *pudp;

    [..]

    // Returns 1 because !kvm_stage2_has_pmd()
    if (stage2_pud_present(kvm, old_pud)) {
        /*
         * If we already have table level mapping for this block, unmap
         * the range for this block and retry.
         */
        if (!stage2_pud_huge(kvm, old_pud)) { // Always true because
!kvm_stage2_has_pmd()
            unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE);
            goto retry;
        }

And we end up jumping back to retry forever. IMO, in user_mem_abort(), if PUD_SIZE
== PMD_SIZE, we should try to map PMD_SIZE instead of PUD_SIZE. Maybe something
like this?

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ba00bcc0c884..178267dec511 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1886,8 +1886,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
phys_addr_t fault_ipa,
         * As for PUD huge maps, we must make sure that we have at least
         * 3 levels, i.e, PMD is not folded.
         */
-       if (vma_pagesize == PMD_SIZE ||
-           (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm)))
+       if (vma_pagesize == PUD_SIZE && !kvm_stage2_has_pmd(kvm))
+               vma_pagesize = PMD_SIZE;
+
+       if (vma_pagesize == PUD_SIZE || vma_pagesize == PUD_SIZE)
                gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
        mmap_read_unlock(current->mm);
 
Thanks,

Alex

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm