On Thu, 21 Feb 2013 10:26:08 -0500, Christoffer Dall <cdall@xxxxxxxxxxxxxxx> wrote: > On Thu, Feb 21, 2013 at 6:01 AM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote: >> On Wed, 20 Feb 2013 15:37:38 -0800, Christoffer Dall >> <cdall@xxxxxxxxxxxxxxx> wrote: >>> On Wed, Feb 13, 2013 at 03:47:01PM +0000, Marc Zyngier wrote: >>>> user_mem_abort always creates a brand new PTE, even when handling >>>> a permission fault. This is semantically wrong, as such a fault >>>> should be handled by reading the existing entry and modifying it. >>> >>> Why is this semantically wrong? >> >> Because we clear existing bits that could have been set, and set defaults >> bits that could have been cleared. We should perform a read/modify/write >> cycle instead. >> >>> Where does the notion of what a fault handler should do come from. >> >> Mostly, from what the rest of the kernel does. >> >>> The fault handler needs to make sure that the right page gets mapped in >>> with the appropriate permissions. In fact I think this change is >>> incorrect, see below. >>> >>>> >>>> Convert user_mem_abort to stage2_get_pte()/stage2_set_pte_at(), >>>> gicing the opportunity to test the present bit, and only use the >>>> default PTE if nothing was present yet. >>>> >>>> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx> >>>> --- >>>> arch/arm/kvm/mmu.c | 13 ++++++++++--- >>>> 1 file changed, 10 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >>>> index 8fde75b..d033344 100644 >>>> --- a/arch/arm/kvm/mmu.c >>>> +++ b/arch/arm/kvm/mmu.c >>>> @@ -519,7 +519,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >>>> phys_addr_t fault_ipa, >>>> gfn_t gfn, struct kvm_memory_slot *memslot, >>>> unsigned long fault_status) >>>> { >>>> - pte_t new_pte; >>>> + pte_t new_pte, *ptep; >>>> pfn_t pfn; >>>> int ret; >>>> bool write_fault, writable; >>>> @@ -553,17 +553,24 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >>>> phys_addr_t fault_ipa, >>>> if (is_error_pfn(pfn)) >>>> return -EFAULT; >>>> >>>> - new_pte = pfn_pte(pfn, PAGE_S2); >>>> coherent_icache_guest_page(vcpu->kvm, gfn); >>>> >>>> spin_lock(&vcpu->kvm->mmu_lock); >>>> if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) >>>> goto out_unlock; >>>> + >>>> + ptep = stage2_get_pte(vcpu->kvm, memcache, fault_ipa); >>>> + if (pte_present(*ptep)) >>>> + new_pte = *ptep; >>> >>> This looks wrong to me. >>> >>> So if KSM merged two pages here and you fault on the page with a write >>> fault, then you're going to just set the writable bit on the page that >>> should still be read-only. >> >> Ah, well spotted. >> >>> Can you explain which problem you're trying to fix here? >> >> It's quite simple, if you think of the last patch which introduces the >> handling of XN faults. >> >> If we introduce the XN handling in a naive way (just adding another test >> for the XN fault, and clear the XN bit from new_pte), we also clear any >> RW >> bit that would have been set by a previous handler. >> >> So if you have a page containing code that writes to the same page, >> you're >> caught in a deadly loop of XN/RO faults. >> >> I could rewrite this code to read: >> >> if (pte_present(*ptep) && pte_pfn(*ptep) == pfn) >> new_pte = *ptep; >> >> hence checking that the page hasn't been changed by CoW. That way, we >> default to a brand new RO/XN PTE if the page has been changed. >> >> What do you think? >> > yeah, that's reasonable. So we have three cases if I can summarize > properly: > > 1. Regular fault (stage 2 page non present) > 2. Write fault (COW) > 3. XN fault (to clear I-cache) > > I think explaining that these are the cases and how we handle them in > a comment is justified at this point. Sure. I'll add that as I revisit the series. M. -- Fast, cheap, reliable. Pick two. _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm