Re: [RFC PATCH 4/7] ARM: KVM: fix user_mem_abort() use of stage2_set_pte

Marc Zyngier <marc.zyngier@xxxxxxx> · Thu, 21 Feb 2013 16:51:57 +0100

On Thu, 21 Feb 2013 10:26:08 -0500, Christoffer Dall
<cdall@xxxxxxxxxxxxxxx> wrote:
> On Thu, Feb 21, 2013 at 6:01 AM, Marc Zyngier <marc.zyngier@xxxxxxx>
wrote:
>> On Wed, 20 Feb 2013 15:37:38 -0800, Christoffer Dall
>> <cdall@xxxxxxxxxxxxxxx> wrote:
>>> On Wed, Feb 13, 2013 at 03:47:01PM +0000, Marc Zyngier wrote:
>>>> user_mem_abort always creates a brand new PTE, even when handling
>>>> a permission fault. This is semantically wrong, as such a fault
>>>> should be handled by reading the existing entry and modifying it.
>>>
>>> Why is this semantically wrong?
>>
>> Because we clear existing bits that could have been set, and set
defaults
>> bits that could have been cleared. We should perform a
read/modify/write
>> cycle instead.
>>
>>> Where does the notion of what a fault handler should do come from.
>>
>> Mostly, from what the rest of the kernel does.
>>
>>> The fault handler needs to make sure that the right page gets mapped
in
>>> with the appropriate permissions.  In fact I think this change is
>>> incorrect, see below.
>>>
>>>>
>>>> Convert user_mem_abort to stage2_get_pte()/stage2_set_pte_at(),
>>>> gicing the opportunity to test the present bit, and only use the
>>>> default PTE if nothing was present yet.
>>>>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
>>>> ---
>>>>  arch/arm/kvm/mmu.c | 13 ++++++++++---
>>>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>> index 8fde75b..d033344 100644
>>>> --- a/arch/arm/kvm/mmu.c
>>>> +++ b/arch/arm/kvm/mmu.c
>>>> @@ -519,7 +519,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
>>>> phys_addr_t fault_ipa,
>>>>                        gfn_t gfn, struct kvm_memory_slot *memslot,
>>>>                        unsigned long fault_status)
>>>>  {
>>>> -    pte_t new_pte;
>>>> +    pte_t new_pte, *ptep;
>>>>      pfn_t pfn;
>>>>      int ret;
>>>>      bool write_fault, writable;
>>>> @@ -553,17 +553,24 @@ static int user_mem_abort(struct kvm_vcpu
*vcpu,
>>>> phys_addr_t fault_ipa,
>>>>      if (is_error_pfn(pfn))
>>>>              return -EFAULT;
>>>>
>>>> -    new_pte = pfn_pte(pfn, PAGE_S2);
>>>>      coherent_icache_guest_page(vcpu->kvm, gfn);
>>>>
>>>>      spin_lock(&vcpu->kvm->mmu_lock);
>>>>      if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
>>>>              goto out_unlock;
>>>> +
>>>> +    ptep = stage2_get_pte(vcpu->kvm, memcache, fault_ipa);
>>>> +    if (pte_present(*ptep))
>>>> +            new_pte = *ptep;
>>>
>>> This looks wrong to me.
>>>
>>> So if KSM merged two pages here and you fault on the page with a write
>>> fault, then you're going to just set the writable bit on the page that
>>> should still be read-only.
>>
>> Ah, well spotted.
>>
>>> Can you explain which problem you're trying to fix here?
>>
>> It's quite simple, if you think of the last patch which introduces the
>> handling of XN faults.
>>
>> If we introduce the XN handling in a naive way (just adding another
test
>> for the XN fault, and clear the XN bit from new_pte), we also clear any
>> RW
>> bit that would have been set by a previous handler.
>>
>> So if you have a page containing code that writes to the same page,
>> you're
>> caught in a deadly loop of XN/RO faults.
>>
>> I could rewrite this code to read:
>>
>>         if (pte_present(*ptep) && pte_pfn(*ptep) == pfn)
>>                 new_pte = *ptep;
>>
>> hence checking that the page hasn't been changed by CoW. That way, we
>> default to a brand new RO/XN PTE if the page has been changed.
>>
>> What do you think?
>>
> yeah, that's reasonable. So we have three cases if I can summarize
> properly:
> 
> 1. Regular fault (stage 2 page non present)
> 2. Write fault (COW)
> 3. XN fault (to clear I-cache)
> 
> I think explaining that these are the cases and how we handle them in
> a comment is justified at this point.

Sure. I'll add that as I revisit the series.

       M.
-- 
Fast, cheap, reliable. Pick two.
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm