Re: [RFC PATCH 4/7] ARM: KVM: fix user_mem_abort() use of stage2_set_pte

Marc Zyngier <marc.zyngier@xxxxxxx> · Thu, 21 Feb 2013 12:01:34 +0100

On Wed, 20 Feb 2013 15:37:38 -0800, Christoffer Dall
<cdall@xxxxxxxxxxxxxxx> wrote:
> On Wed, Feb 13, 2013 at 03:47:01PM +0000, Marc Zyngier wrote:
>> user_mem_abort always creates a brand new PTE, even when handling
>> a permission fault. This is semantically wrong, as such a fault
>> should be handled by reading the existing entry and modifying it.
> 
> Why is this semantically wrong?

Because we clear existing bits that could have been set, and set defaults
bits that could have been cleared. We should perform a read/modify/write
cycle instead.

> Where does the notion of what a fault handler should do come from.

Mostly, from what the rest of the kernel does.

> The fault handler needs to make sure that the right page gets mapped in
> with the appropriate permissions.  In fact I think this change is
> incorrect, see below.
> 
>> 
>> Convert user_mem_abort to stage2_get_pte()/stage2_set_pte_at(),
>> gicing the opportunity to test the present bit, and only use the
>> default PTE if nothing was present yet.
>> 
>> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
>> ---
>>  arch/arm/kvm/mmu.c | 13 ++++++++++---
>>  1 file changed, 10 insertions(+), 3 deletions(-)
>> 
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 8fde75b..d033344 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -519,7 +519,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
>> phys_addr_t fault_ipa,
>>  			  gfn_t gfn, struct kvm_memory_slot *memslot,
>>  			  unsigned long fault_status)
>>  {
>> -	pte_t new_pte;
>> +	pte_t new_pte, *ptep;
>>  	pfn_t pfn;
>>  	int ret;
>>  	bool write_fault, writable;
>> @@ -553,17 +553,24 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
>> phys_addr_t fault_ipa,
>>  	if (is_error_pfn(pfn))
>>  		return -EFAULT;
>>  
>> -	new_pte = pfn_pte(pfn, PAGE_S2);
>>  	coherent_icache_guest_page(vcpu->kvm, gfn);
>>  
>>  	spin_lock(&vcpu->kvm->mmu_lock);
>>  	if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
>>  		goto out_unlock;
>> +
>> +	ptep = stage2_get_pte(vcpu->kvm, memcache, fault_ipa);
>> +	if (pte_present(*ptep))
>> +		new_pte = *ptep;
> 
> This looks wrong to me.
> 
> So if KSM merged two pages here and you fault on the page with a write
> fault, then you're going to just set the writable bit on the page that
> should still be read-only.

Ah, well spotted.

> Can you explain which problem you're trying to fix here?

It's quite simple, if you think of the last patch which introduces the
handling of XN faults.

If we introduce the XN handling in a naive way (just adding another test
for the XN fault, and clear the XN bit from new_pte), we also clear any RW
bit that would have been set by a previous handler.

So if you have a page containing code that writes to the same page, you're
caught in a deadly loop of XN/RO faults.

I could rewrite this code to read:

        if (pte_present(*ptep) && pte_pfn(*ptep) == pfn)
                new_pte = *ptep;

hence checking that the page hasn't been changed by CoW. That way, we
default to a brand new RO/XN PTE if the page has been changed.

What do you think?

        M.
-- 
Fast, cheap, reliable. Pick two.
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm