On 05/23/2012 11:55 AM, Xiao Guangrong wrote: > If the the present bit of page fault error code is set, it indicates > the shadow page is populated on all levels, it means what we do is > only modify the access bit which can be done out of mmu-lock > > Currently, in order to simplify the code, we only fix the page fault > caused by write-protect on the fast path > > + > static bool spte_has_volatile_bits(u64 spte) > { > + /* > + * Always atomicly update spte if it can be updated > + * out of mmu-lock. > + */ > + if (spte_can_lockless_update(spte)) > + return true; This is a really subtle point, but is it really needed? Lockless spte updates should always set the dirty and accessed bits, so we won't be overwriting any volatile bits there. > + > if (!shadow_accessed_mask) > return false; > > @@ -498,13 +517,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte) > return ret; > } > > - new_spte |= old_spte & shadow_dirty_mask; > - > - mask = shadow_accessed_mask; > - if (is_writable_pte(old_spte)) > - mask |= shadow_dirty_mask; > - > - if (!spte_has_volatile_bits(old_spte) || (new_spte & mask) == mask) > + if (!spte_has_volatile_bits(old_spte)) > __update_clear_spte_fast(sptep, new_spte); > else > old_spte = __update_clear_spte_slow(sptep, new_spte); It looks like the old code is bad.. why can we ignore volatile bits in the old spte? Suppose pfn is changing? > + > +static bool > +fast_pf_fix_indirect_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, > + u64 *sptep, u64 spte, gfn_t gfn) > +{ > + pfn_t pfn; > + bool ret = false; > + > + /* > + * For the indirect spte, it is hard to get a stable gfn since > + * we just use a cmpxchg to avoid all the races which is not > + * enough to avoid the ABA problem: the host can arbitrarily > + * change spte and the mapping from gfn to pfn. > + * > + * What we do is call gfn_to_pfn_atomic to bind the gfn and the > + * pfn because after the call: > + * - we have held the refcount of pfn that means the pfn can not > + * be freed and be reused for another gfn. > + * - the pfn is writable that means it can not be shared by different > + * gfn. > + */ > + pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn); > + > + /* The host page is swapped out or merged. */ > + if (mmu_invalid_pfn(pfn)) > + goto exit; > + > + ret = true; > + > + if (pfn != spte_to_pfn(spte)) > + goto exit; > + > + if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte) > + mark_page_dirty(vcpu->kvm, gfn); Isn't it better to kvm_release_pfn_dirty() here? > + > +exit: > + kvm_release_pfn_clean(pfn); > + return ret; > +} > + > +> + > +/* > + * Return value: > + * - true: let the vcpu to access on the same address again. > + * - false: let the real page fault path to fix it. > + */ > +static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn, > + int level, u32 error_code) > +{ > + struct kvm_shadow_walk_iterator iterator; > + struct kvm_mmu_page *sp; > + bool ret = false; > + u64 spte = 0ull; > + > + if (!page_fault_can_be_fast(vcpu, gfn, error_code)) > + return false; > + No need to pass gfn here. > + walk_shadow_page_lockless_begin(vcpu); > + for_each_shadow_entry_lockless(vcpu, gva, iterator, spte) > + if (!is_shadow_present_pte(spte) || iterator.level < level) > + break; > + > + /* > + * If the mapping has been changed, let the vcpu fault on the > + * same address again. > + */ > + if (!is_rmap_spte(spte)) { > + ret = true; > + goto exit; > + } > + > + if (!is_last_spte(spte, level)) > + goto exit; > + > + /* > + * Check if it is a spurious fault caused by TLB lazily flushed. > + * > + * Need not check the access of upper level table entries since > + * they are always ACC_ALL. > + */ > + if (is_writable_pte(spte)) { > + ret = true; > + goto exit; > + } > + > + /* > + * Currently, to simplify the code, only the spte write-protected > + * by dirty-log can be fast fixed. > + */ > + if (!spte_can_be_writable(spte)) > + goto exit; > + > + sp = page_header(__pa(iterator.sptep)); > + > + if (sp->role.direct) > + ret = fast_pf_fix_direct_spte(vcpu, sp, iterator.sptep, spte); > + else > + ret = fast_pf_fix_indirect_spte(vcpu, sp, iterator.sptep, > + spte, gfn); > + > +exit: > + walk_shadow_page_lockless_end(vcpu); > + > + return ret; > +} > + -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html