Re: [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly

Marc Zyngier <maz@xxxxxxxxxx> · Wed, 02 Sep 2020 13:04:00 +0100

On 2020-09-02 12:53, Alexandru Elisei wrote:
Hi Marc,

On 9/2/20 12:10 PM, Marc Zyngier wrote:
On 2020-09-02 11:59, Alexandru Elisei wrote:
Hi,

On 8/22/20 3:44 AM, Gavin Shan wrote:
Depending on the kernel configuration, PUD_SIZE could be equal to
PMD_SIZE. For example, both of them are 512MB with the following
kernel configuration. In this case, both PUD and PMD are folded
to PGD.

   CONFIG_ARM64_64K_PAGES   y
   CONFIG_ARM64_VA_BITS     42
   CONFIG_PGTABLE_LEVELS    2

With the above configuration, the stage2 PUD is used to backup the
512MB huge page when the stage2 mapping is built. During the 
mapping,
the PUD and its subordinate levels of page table entries are 
unmapped
if the PUD is present and not huge page sensitive in 
stage2_set_pud_huge().
Unfornately, the @addr isn't aligned to S2_PUD_SIZE and wrong page 
table
entries are zapped. It eventually leads to PUD's present bit can't 
be
cleared successfully and infinite loop in stage2_set_pud_huge().

This fixes the issue by checking with S2_{PUD, PMD}_SIZE instead of
{PUD, PMD}_SIZE to determine if stage2 PUD or PMD is used to back 
the
huge page. For this particular case, the stage2 PMD entry should be
used to backup the 512MB huge page with stage2_set_pmd_huge().

I can reproduce this on my rockpro64 using kvmtool.

I see two issues here: first, PUD_SIZE = 512MB, but S2_PUD_SIZE = 4TB 
(checked
using printk), and second, stage2_set_pud_huge() hangs. I'm working 
on
debugging them.

I have this as an immediate fix for the set_pud_huge hang, tested
on Seattle with 64k/42bits.

I can't wait to see the back of this code...

The problem is in stage2_set_pud_huge(), because kvm_stage2_has_pmd() 
returns
false (CONFIG_PGTABLE_LEVELS = 2):

    pudp = stage2_get_pud(mmu, cache, addr);
    VM_BUG_ON(!pudp);

    old_pud = *pudp;

    [..]

    // Returns 1 because !kvm_stage2_has_pmd()
    if (stage2_pud_present(kvm, old_pud)) {
        /*
         * If we already have table level mapping for this block, unmap
         * the range for this block and retry.
         */
        if (!stage2_pud_huge(kvm, old_pud)) { // Always true because
!kvm_stage2_has_pmd()
            unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE);
            goto retry;
        }

And we end up jumping back to retry forever. IMO, in user_mem_abort(),
if PUD_SIZE
== PMD_SIZE, we should try to map PMD_SIZE instead of PUD_SIZE. Maybe 
something
like this?

Err... If PUD_SIZE == PMD_SIZE, what difference does it make to map
one or the other?

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ba00bcc0c884..178267dec511 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1886,8 +1886,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
phys_addr_t fault_ipa,
         * As for PUD huge maps, we must make sure that we have at 
least
         * 3 levels, i.e, PMD is not folded.
         */
-       if (vma_pagesize == PMD_SIZE ||
-           (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm)))
+       if (vma_pagesize == PUD_SIZE && !kvm_stage2_has_pmd(kvm))
+               vma_pagesize = PMD_SIZE;
+
+       if (vma_pagesize == PUD_SIZE || vma_pagesize == PUD_SIZE)
                gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >>
PAGE_SHIFT;
        mmap_read_unlock(current->mm);

I don't think this solves anything in the 2 level case. The gist of
the issue is that if we go on the PUD path, we end-up computing the
wrong offset for the entry and either loop or do something silly.
In either case, we read/write something we have no business touching.

Gavin sidesteps the problem by using a fantasy PUD size, while I
catch it by explicitly checking for the PMD == PUD case.

        M.
--
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm