Re: [PATCH 13/15] KVM: ARM: Handle guest faults in KVM

Catalin Marinas <catalin.marinas@xxxxxxx> · Thu, 27 Sep 2012 18:21:23 +0100

On Thu, Sep 27, 2012 at 06:15:05PM +0100, Christoffer Dall wrote:
> On Thu, Sep 27, 2012 at 8:39 AM, Catalin Marinas
> <catalin.marinas@xxxxxxx> wrote:
> > On 25 September 2012 13:38, Christoffer Dall
> > <c.dall@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> >>>> +
> >>>> +     /*
> >>>> +      * If this is a write fault (think COW) we need to make sure the
> >>>> +      * existing page, which other CPUs might still read, doesn't go
> >>>> away
> >>>> +      * from under us, by calling gfn_to_pfn_prot(write_fault=true).
> >>>> +      * Therefore, we call gfn_to_pfn_prot(write_fault=false), which
> >>>> will
> >>>> +      * pin the existing page, then we get a new page for the user space
> >>>> +      * pte and map this in the stage-2 table where we also make sure to
> >>>> +      * flush the TLB for the VM, if there was an existing entry (the
> >>>> entry
> >>>> +      * was updated setting the write flag to the potentially new page).
> >>>> +      */
> >>>> +     if (fault_status == FSC_PERM) {
> >>>> +             pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false, NULL);
> >>>> +             if (is_error_pfn(pfn_existing))
> >>>> +                     return -EFAULT;
> >>>> +     }
> >>>> +
> >>>> +     pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
> >>>> +     if (is_error_pfn(pfn)) {
> >>>> +             ret = -EFAULT;
> >>>> +             goto out_put_existing;
> >>>> +     }
> >>>> +
> >>>> +     /* We need minimum second+third level pages */
> >>>> +     ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
> >>>> +     if (ret)
> >>>> +             goto out;
> >>>> +     new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
> >>>> +     if (writable)
> >>>> +             pte_val(new_pte) |= L_PTE2_WRITE;
> >>>> +     coherent_icache_guest_page(vcpu->kvm, gfn);
> >>>
> >>> why don't you flush icache only when guest has mapped executable page
> >>> as __sync_icache_dcache function does currently?
> >>
> >> because we don't know if the guest will map the page executable. The
> >> guest may read the page through a normal load, which causes the fault,
> >> and subsequently execute it (even possible through different guest
> >> mappings). The only way to see this happening would be to mark all
> >> pages as non-executable and catch the fault when it occurs -
> >> unfortunately the HPFAR which gives us the IPA is not populated on
> >> execute never faults, so we would have to translate the PC's va to ipa
> >> using cp15 functionality when this happens, which is then also racy
> >> with other CPUs.
> >
> > I think you can avoid the race in the stage 2 XN case. In the Hyp
> > exception handler entered because of a stage 2 XN bit you can get the
> > IPA via the CP15 ATS1CPR and PAR registers. If the address translation
> > failed because the same guest running on other CPU changed the stage 1
> > page table, you can simply return to the guest rather than switching
> > to host with incomplete information. The guest may handle its own
> > stage 1 fault and eventually trigger another stage 2 permission and
> > Hyp will try the address translation again. That's a very rare
> > situation, so just returning without handling it would not cause any
> > performance issues.
> >
> you're right that the race is not a big issue, but it's not clear to
> me that the trapping + ATS1CPR will be faster than just flushing
> icache - we'll have to measure this.

I agree, it needs measuring first as it may not be worth the hassle.

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html