On Friday, November 25, 2016 10:45:28 AM Paolo Bonzini wrote: > > On 24/11/2016 04:50, Junaid Shahid wrote: > > On Monday, November 21, 2016 03:42:23 PM Paolo Bonzini wrote: > >> Please introduce a new function spte_is_access_tracking_enabled(u64 > >> old_spte, u64 new_spte) and use it here: > >> > >> if (shadow_accessed_mask ? > >> spte_is_bit_cleared(old_spte, new_spte, shadow_accessed_mask) : > >> spte_is_access_tracking_enabled(old_spte, new_spte)) { > >> flush |= !shadow_accessed_mask; > >> kvm_set_pfn_accessed(spte_to_pfn(old_spte)); > >> } > >> > > > > I think we can just set flush = true in the then block instead of > > flush |= !shadow_accessed_mask. > > > > And while we are at it, is there any reason to flush the TLB when setting the > > A or D bit in the PTE? If not, we can remove this earlier block since the > > clearing case is now handled in the separate if blocks for accessed and dirty: > > Hmm, flushing the TLB is expensive. I would have thought that we want to avoid > a shootdown (kvm_flush_remote_tlbs) for the A and D bits. But it's probably rare > enough that it doesn't matter, and the existing code has > > /* > * Flush TLB when accessed/dirty bits are changed in the page tables, > * to guarantee consistency between TLB and page tables. > */ > if (spte_is_bit_changed(old_spte, new_spte, > shadow_accessed_mask | shadow_dirty_mask)) > ret = true; > > so yeah. Ok. So I’ll remove the existing spte_is_bit_changed block and set flush = true inside the separate blocks that check accessed and dirty masks. > >> But please anticipate the changes to this function, except for > >> introducing is_access_track_spte of course, to a separate function. > > > > Sorry, I didn’t exactly understand what you meant by the last line. But I have > > made it like this: > > > > if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte)) { > > flush = true; > > kvm_set_pfn_accessed(spte_to_pfn(old_spte)); > > } > > if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte)) { > > flush = true; > > kvm_set_pfn_dirty(spte_to_pfn(old_spte)); > > } > > The idea is to split this patch in two. You can first refactor the function to > have the above code and introduce is_accessed_spte/is_dirty_spte. Then, you > add the lockless access tracking on top. But it's okay if we leave it for a > subsequent review. There are going to be many changes already between v2 and v3! Thanks for the clarification. I guess I can just add the other patch now to separate out the refactoring from the rest of the access tracking changes. > >>> + spte &= ~(shadow_acc_track_saved_bits_mask << > >>> + shadow_acc_track_saved_bits_shift); > >> > >> Should there be a WARN_ON if these bits are not zero? > > > > No, these bits can be non-zero from a previous instance of > > mark_spte_for_access_track. They are not cleared when the PTE is restored to > > normal state. (Though we could do that and then have a WARN_ONCE here.) > > Ok, if it's not too hard, do add it. I think it's worth having > more self-checks. Sure. Will do. > > >>> @@ -2953,36 +3104,43 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level, > >>> > >>> /* > >>> - * Check if it is a spurious fault caused by TLB lazily flushed. > >>> + * Check whether the memory access that caused the fault would > >>> + * still cause it if it were to be performed right now. If not, > >>> + * then this is a spurious fault caused by TLB lazily flushed, > >>> + * or some other CPU has already fixed the PTE after the > >>> + * current CPU took the fault. > >>> + * > >>> + * If Write-Only mappings ever become supported, then the > >>> + * condition below would need to be changed appropriately. > >>> * > >>> * Need not check the access of upper level table entries since > >>> * they are always ACC_ALL. > >>> */ > >>> - if (is_writable_pte(spte)) { > >>> - ret = true; > >>> + if (((spte & PT_PRESENT_MASK) && !remove_write_prot) || > >>> + valid_exec_access) { > >>> + fault_handled = true; > >>> break; > >>> } > >> > >> Let's separate the three conditions (R/W/X): > >> > >> if ((error_code & PFERR_FETCH_MASK) { > >> if ((spte & (shadow_x_mask|shadow_nx_mask)) > >> == shadow_x_mask) { > >> fault_handled = true; > >> break; > >> } > >> } > >> if (error_code & PFERR_WRITE_MASK) { > >> if (is_writable_pte(spte)) { > >> fault_handled = true; > >> break; > >> } > >> remove_write_prot = > >> spte_can_locklessly_be_made_writable(spte); > >> } > >> if (!(error_code & PFERR_PRESENT_MASK)) { > >> if (!is_access_track_spte(spte)) { > >> fault_handled = true; > >> break; > >> } > >> remove_acc_track = true; > >> } > > > > I think the third block is incorrect e.g. it will set fault_handled = true even > > for a completely zero PTE. > > A completely zero PTE would have been filtered before by the > is_shadow_present_pte check, wouldn't it? Oh, the is_shadow_present_pte check was actually removed in the patch. We could add it back, minus the ret = true statement, and then it would filter the zero PTE case. But I still think that the other form: if ((error_code & PFERR_USER_MASK) && (spte & PT_PRESENT_MASK)) { fault_handled = true; break; } is simpler as it is directly analogous to the cases for fetch and write. Please let me know if you think otherwise. > >>> +/* This is only supposed to be used for non-EPT mappings */ > >> > >> It's only used for non-EPT mappings, why is it only *supposed* to be > >> used for non-EPT mappings? It seems to me that it would work. > >> > >>> static bool need_remote_flush(u64 old, u64 new) > >>> { > > > > It would work but it will return true in at least one (probably only) case > > where it doesn’t need to e.g. going from an acc-track PTE to a zeroed PTE. > > Though maybe that is not a big deal. Of course, we could also just update it > > to handle acc-track. > > Yeah, I think it's not a big deal. But actually is it really used only > for non-EPT? Nested virtualization uses kvm_mmu_pte_write for EPT as well. Ok. I guess it might be more accurate to say indirect mappings instead of non-EPT mappings. In any case, I’ll just remove the comment. Thanks, Junaid -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html