Re: [RFC PATCH 4/7] ARM: KVM: fix user_mem_abort() use of stage2_set_pte

Christoffer Dall <cdall@xxxxxxxxxxxxxxx> · Thu, 21 Feb 2013 10:26:08 -0500

On Thu, Feb 21, 2013 at 6:01 AM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
> On Wed, 20 Feb 2013 15:37:38 -0800, Christoffer Dall
> <cdall@xxxxxxxxxxxxxxx> wrote:
>> On Wed, Feb 13, 2013 at 03:47:01PM +0000, Marc Zyngier wrote:
>>> user_mem_abort always creates a brand new PTE, even when handling
>>> a permission fault. This is semantically wrong, as such a fault
>>> should be handled by reading the existing entry and modifying it.
>>
>> Why is this semantically wrong?
>
> Because we clear existing bits that could have been set, and set defaults
> bits that could have been cleared. We should perform a read/modify/write
> cycle instead.
>
>> Where does the notion of what a fault handler should do come from.
>
> Mostly, from what the rest of the kernel does.
>
>> The fault handler needs to make sure that the right page gets mapped in
>> with the appropriate permissions.  In fact I think this change is
>> incorrect, see below.
>>
>>>
>>> Convert user_mem_abort to stage2_get_pte()/stage2_set_pte_at(),
>>> gicing the opportunity to test the present bit, and only use the
>>> default PTE if nothing was present yet.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
>>> ---
>>>  arch/arm/kvm/mmu.c | 13 ++++++++++---
>>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>> index 8fde75b..d033344 100644
>>> --- a/arch/arm/kvm/mmu.c
>>> +++ b/arch/arm/kvm/mmu.c
>>> @@ -519,7 +519,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
>>> phys_addr_t fault_ipa,
>>>                        gfn_t gfn, struct kvm_memory_slot *memslot,
>>>                        unsigned long fault_status)
>>>  {
>>> -    pte_t new_pte;
>>> +    pte_t new_pte, *ptep;
>>>      pfn_t pfn;
>>>      int ret;
>>>      bool write_fault, writable;
>>> @@ -553,17 +553,24 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
>>> phys_addr_t fault_ipa,
>>>      if (is_error_pfn(pfn))
>>>              return -EFAULT;
>>>
>>> -    new_pte = pfn_pte(pfn, PAGE_S2);
>>>      coherent_icache_guest_page(vcpu->kvm, gfn);
>>>
>>>      spin_lock(&vcpu->kvm->mmu_lock);
>>>      if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
>>>              goto out_unlock;
>>> +
>>> +    ptep = stage2_get_pte(vcpu->kvm, memcache, fault_ipa);
>>> +    if (pte_present(*ptep))
>>> +            new_pte = *ptep;
>>
>> This looks wrong to me.
>>
>> So if KSM merged two pages here and you fault on the page with a write
>> fault, then you're going to just set the writable bit on the page that
>> should still be read-only.
>
> Ah, well spotted.
>
>> Can you explain which problem you're trying to fix here?
>
> It's quite simple, if you think of the last patch which introduces the
> handling of XN faults.
>
> If we introduce the XN handling in a naive way (just adding another test
> for the XN fault, and clear the XN bit from new_pte), we also clear any RW
> bit that would have been set by a previous handler.
>
> So if you have a page containing code that writes to the same page, you're
> caught in a deadly loop of XN/RO faults.
>
> I could rewrite this code to read:
>
>         if (pte_present(*ptep) && pte_pfn(*ptep) == pfn)
>                 new_pte = *ptep;
>
> hence checking that the page hasn't been changed by CoW. That way, we
> default to a brand new RO/XN PTE if the page has been changed.
>
> What do you think?
>
yeah, that's reasonable. So we have three cases if I can summarize properly:

1. Regular fault (stage 2 page non present)
2. Write fault (COW)
3. XN fault (to clear I-cache)

I think explaining that these are the cases and how we handle them in
a comment is justified at this point.

Thanks,
-Christoffer
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm