On Thu, Feb 21, 2013 at 6:01 AM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote: > On Wed, 20 Feb 2013 15:37:38 -0800, Christoffer Dall > <cdall@xxxxxxxxxxxxxxx> wrote: >> On Wed, Feb 13, 2013 at 03:47:01PM +0000, Marc Zyngier wrote: >>> user_mem_abort always creates a brand new PTE, even when handling >>> a permission fault. This is semantically wrong, as such a fault >>> should be handled by reading the existing entry and modifying it. >> >> Why is this semantically wrong? > > Because we clear existing bits that could have been set, and set defaults > bits that could have been cleared. We should perform a read/modify/write > cycle instead. > >> Where does the notion of what a fault handler should do come from. > > Mostly, from what the rest of the kernel does. > >> The fault handler needs to make sure that the right page gets mapped in >> with the appropriate permissions. In fact I think this change is >> incorrect, see below. >> >>> >>> Convert user_mem_abort to stage2_get_pte()/stage2_set_pte_at(), >>> gicing the opportunity to test the present bit, and only use the >>> default PTE if nothing was present yet. >>> >>> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx> >>> --- >>> arch/arm/kvm/mmu.c | 13 ++++++++++--- >>> 1 file changed, 10 insertions(+), 3 deletions(-) >>> >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >>> index 8fde75b..d033344 100644 >>> --- a/arch/arm/kvm/mmu.c >>> +++ b/arch/arm/kvm/mmu.c >>> @@ -519,7 +519,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >>> phys_addr_t fault_ipa, >>> gfn_t gfn, struct kvm_memory_slot *memslot, >>> unsigned long fault_status) >>> { >>> - pte_t new_pte; >>> + pte_t new_pte, *ptep; >>> pfn_t pfn; >>> int ret; >>> bool write_fault, writable; >>> @@ -553,17 +553,24 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >>> phys_addr_t fault_ipa, >>> if (is_error_pfn(pfn)) >>> return -EFAULT; >>> >>> - new_pte = pfn_pte(pfn, PAGE_S2); >>> coherent_icache_guest_page(vcpu->kvm, gfn); >>> >>> spin_lock(&vcpu->kvm->mmu_lock); >>> if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) >>> goto out_unlock; >>> + >>> + ptep = stage2_get_pte(vcpu->kvm, memcache, fault_ipa); >>> + if (pte_present(*ptep)) >>> + new_pte = *ptep; >> >> This looks wrong to me. >> >> So if KSM merged two pages here and you fault on the page with a write >> fault, then you're going to just set the writable bit on the page that >> should still be read-only. > > Ah, well spotted. > >> Can you explain which problem you're trying to fix here? > > It's quite simple, if you think of the last patch which introduces the > handling of XN faults. > > If we introduce the XN handling in a naive way (just adding another test > for the XN fault, and clear the XN bit from new_pte), we also clear any RW > bit that would have been set by a previous handler. > > So if you have a page containing code that writes to the same page, you're > caught in a deadly loop of XN/RO faults. > > I could rewrite this code to read: > > if (pte_present(*ptep) && pte_pfn(*ptep) == pfn) > new_pte = *ptep; > > hence checking that the page hasn't been changed by CoW. That way, we > default to a brand new RO/XN PTE if the page has been changed. > > What do you think? > yeah, that's reasonable. So we have three cases if I can summarize properly: 1. Regular fault (stage 2 page non present) 2. Write fault (COW) 3. XN fault (to clear I-cache) I think explaining that these are the cases and how we handle them in a comment is justified at this point. Thanks, -Christoffer _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm