Hi Paolo, Thank you for the detailed feedback. I will send an updated version of the patch soon. A few comments below: On Monday, November 21, 2016 03:42:23 PM Paolo Bonzini wrote: > > @@ -576,20 +617,34 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte) > > */ > > if (spte_is_bit_changed(old_spte, new_spte, > > shadow_accessed_mask | shadow_dirty_mask)) > > - ret = true; > > + flush = true; > > > > - if (spte_is_bit_cleared(old_spte, new_spte, shadow_accessed_mask)) > > + writable_cleared = is_writable_pte(old_spte) && > > + !is_writable_pte(new_spte); > > + acc_track_enabled = !is_access_track_spte(old_spte) && > > + is_access_track_spte(new_spte); > > + > > + if (writable_cleared || acc_track_enabled) > > + flush = true; > > + > > + if (shadow_accessed_mask ? > > + spte_is_bit_cleared(old_spte, new_spte, shadow_accessed_mask) : > > + acc_track_enabled) > > Please introduce a new function spte_is_access_tracking_enabled(u64 > old_spte, u64 new_spte) and use it here: > > if (shadow_accessed_mask ? > spte_is_bit_cleared(old_spte, new_spte, shadow_accessed_mask) : > spte_is_access_tracking_enabled(old_spte, new_spte)) { > flush |= !shadow_accessed_mask; > kvm_set_pfn_accessed(spte_to_pfn(old_spte)); > } > I think we can just set flush = true in the then block instead of flush |= !shadow_accessed_mask. And while we are at it, is there any reason to flush the TLB when setting the A or D bit in the PTE? If not, we can remove this earlier block since the clearing case is now handled in the separate if blocks for accessed and dirty: if (spte_is_bit_changed(old_spte, new_spte, shadow_accessed_mask | shadow_dirty_mask)) flush = true; Also, instead of spte_is_access_tracking_enabled(), I’ve added is_accessed_spte as you suggested later and used that here as well. > > - if (spte_is_bit_cleared(old_spte, new_spte, shadow_dirty_mask)) > > + > > + if (shadow_dirty_mask ? > > + spte_is_bit_cleared(old_spte, new_spte, shadow_dirty_mask) : > > + writable_cleared) > > writable_cleared can be inline here and written as > > spte_is_bit_cleared(old_spte, new_spte, PT_WRITABLE_MASK) > > so > > if (shadow_dirty_mask ? > spte_is_bit_cleared(old_spte, new_spte, shadow_dirty_mask) : > spte_is_bit_cleared(old_spte, new_spte, PT_WRITABLE_MASK)) { > flush |= !shadow_dirty_mask; > kvm_set_pfn_dirty(spte_to_pfn(old_spte)); > } > > kvm_set_pfn_dirty(spte_to_pfn(old_spte)); > > > > - return ret; > > + return flush; > > } > > But please anticipate the changes to this function, except for > introducing is_access_track_spte of course, to a separate function. Sorry, I didn’t exactly understand what you meant by the last line. But I have made it like this: if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte)) { flush = true; kvm_set_pfn_accessed(spte_to_pfn(old_spte)); } if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte)) { flush = true; kvm_set_pfn_dirty(spte_to_pfn(old_spte)); } > > static int mmu_spte_clear_track_bits(u64 *sptep) > > { > > @@ -604,6 +659,13 @@ static int mmu_spte_clear_track_bits(u64 *sptep) > > if (!is_shadow_present_pte(old_spte)) > > return 0; > > > > + /* > > + * For access tracking SPTEs, the pfn was already marked accessed/dirty > > + * when the SPTE was marked for access tracking, so nothing to do here. > > + */ > > + if (is_access_track_spte(old_spte)) > > + return 1; > > This should go after the "WARN_ON", since that's a valuable check. In > addition, I think it's a good idea to keep similar idioms between > mmu_spte_update and mmu_spte_clear_track_bits, like this: > > if (shadow_accessed_mask > ? old_spte & shadow_accessed_mask > : !is_access_track_spte(old_spte)) > kvm_set_pfn_accessed(pfn); > if (shadow_dirty_mask > ? old_spte & shadow_dirty_mask > : old_spte & PT_WRITABLE_MASK) > kvm_set_pfn_dirty(pfn); > > return 1; > > or (you pick) > > if (shadow_accessed_mask > ? !(old_spte & shadow_accessed_mask) > : is_access_track_spte(old_spte)) > return 1; > > kvm_set_pfn_accessed(pfn); > if (shadow_dirty_mask > ? old_spte & shadow_dirty_mask > : old_spte & PT_WRITABLE_MASK) > kvm_set_pfn_dirty(pfn); > > return 1; I’ve replaced the checks with if(is_accessed_spte...) / if (is_dirty_spte...) > > + /* > > + * Any PTE marked for access tracking should also be marked for dirty > > + * tracking (by being non-writable) > > + */ > > + spte &= ~PT_WRITABLE_MASK; > > This should be implicit in the definition of > shadow_acc_track_mask/value, so it's not necessary (or it can be a > WARN_ONCE). It can't be handled by shadow_acc_track_mask/value, but I have changed shadow_acc_track_saved_bits_mask to save only the R/X bits, which achieves the same result. > > > + spte &= ~(shadow_acc_track_saved_bits_mask << > > + shadow_acc_track_saved_bits_shift); > > Should there be a WARN_ON if these bits are not zero? No, these bits can be non-zero from a previous instance of mark_spte_for_access_track. They are not cleared when the PTE is restored to normal state. (Though we could do that and then have a WARN_ONCE here.) > > @@ -2919,10 +3063,17 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, > > * > > * Compare with set_spte where instead shadow_dirty_mask is set. > > */ > > - if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) != spte) > > + if (cmpxchg64(sptep, old_spte, new_spte) != old_spte) > > return false; > > > > - kvm_vcpu_mark_page_dirty(vcpu, gfn); > > + if (remove_write_prot) { > > Stupid question ahead, why not call kvm_vcpu_mark_page_accessed in this > function? We could, but I kept that call in the same paths as before for consistency with the other cases. Plus, even though it is a cheap call, why add it to the fast PF path if it is not necessary. > > @@ -2953,36 +3104,43 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level, > > > > /* > > - * Check if it is a spurious fault caused by TLB lazily flushed. > > + * Check whether the memory access that caused the fault would > > + * still cause it if it were to be performed right now. If not, > > + * then this is a spurious fault caused by TLB lazily flushed, > > + * or some other CPU has already fixed the PTE after the > > + * current CPU took the fault. > > + * > > + * If Write-Only mappings ever become supported, then the > > + * condition below would need to be changed appropriately. > > * > > * Need not check the access of upper level table entries since > > * they are always ACC_ALL. > > */ > > - if (is_writable_pte(spte)) { > > - ret = true; > > + if (((spte & PT_PRESENT_MASK) && !remove_write_prot) || > > + valid_exec_access) { > > + fault_handled = true; > > break; > > } > > Let's separate the three conditions (R/W/X): > > if ((error_code & PFERR_FETCH_MASK) { > if ((spte & (shadow_x_mask|shadow_nx_mask)) > == shadow_x_mask) { > fault_handled = true; > break; > } > } > if (error_code & PFERR_WRITE_MASK) { > if (is_writable_pte(spte)) { > fault_handled = true; > break; > } > remove_write_prot = > spte_can_locklessly_be_made_writable(spte); > } > if (!(error_code & PFERR_PRESENT_MASK)) { > if (!is_access_track_spte(spte)) { > fault_handled = true; > break; > } > remove_acc_track = true; > } I think the third block is incorrect e.g. it will set fault_handled = true even for a completely zero PTE. I have replaced it with the following: if ((error_code & PFERR_USER_MASK) && (spte & PT_PRESENT_MASK)) { fault_handled = true; break; } remove_acc_track = is_access_track_spte(spte); > > > > +/* This is only supposed to be used for non-EPT mappings */ > > It's only used for non-EPT mappings, why is it only *supposed* to be > used for non-EPT mappings? It seems to me that it would work. > > > static bool need_remote_flush(u64 old, u64 new) > > { It would work but it will return true in at least one (probably only) case where it doesn’t need to e.g. going from an acc-track PTE to a zeroed PTE. Though maybe that is not a big deal. Of course, we could also just update it to handle acc-track. Thanks, Junaid -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html