Hi Marcelo, Thanks your time to review it. On 04/09/2014 10:51 PM, Marcelo Tosatti wrote: >> +/* >> + * Please note PT_WRITABLE_MASK is not stable since >> + * 1) fast_page_fault() sets spte from readonly to writable out of mmu-lock or >> + * 2) kvm_mmu_slot_remove_write_access() and kvm_vm_ioctl_get_dirty_log() >> + * can write protect sptes but flush tlb out mmu-lock that means we may use >> + * the corrupt tlb entries which depend on this bit. >> + * >> + * Both cases do not modify the status of spte_is_locklessly_modifiable() so >> + * if you want to check whether the spte is writable on MMU you can check >> + * SPTE_MMU_WRITEABLE instead. If you want to update spte without losing >> + * A/D bits and tlb flush, you can check spte_is_locklessly_modifiable() >> + * instead. See the comments in spte_has_volatile_bits() and >> + * mmu_spte_update(). >> + */ >> static inline int is_writable_pte(unsigned long pte) >> { > > Xiao, > > Can't get the SPTE_MMU_WRITEABLE part. > > So assume you are writing code to perform some action after guest memory > has been write protected. You would > > spin_lock(mmu_lock); > > if (writeable spte bit is set) > remove writeable spte bit from spte > remote TLB flush (*) > action > > spin_unlock(mmu_lock); > > (*) is necessary because reading the writeable spte bit as zero > does not guarantee remote TLBs have been flushed. > > Now what SPTE_MMU_WRITEABLE has to do with it ? It is contained in "remove writeable spte bit from spte" which is done by mmu_spte_update(): if (spte_is_locklessly_modifiable(old_spte) && !is_writable_pte(new_spte)) ret = true; > > Perhaps a recipe like that (or just the rules) would be useful. Okay, i am considering to improve this comments, how about like this: Currently, we have two sorts of write-protection, a) the first one write-protects guest page to sync the guest modification, b) another one is used to sync dirty bitmap when we do KVM_GET_DIRTY_LOG. The differences between these two sorts are: 1) the first case clears SPTE_MMU_WRITEABLE bit. 2) the first case requires flushing tlb immediately avoiding corrupting shadow page table between all vcpus so it should be in the protection of mmu-lock. And the another case does not need to flush tlb until returning the dirty bitmap to userspace since it only write-protects the page logged in the bitmap, that means the page in the dirty bitmap is not missed, so it can flush tlb out of mmu-lock. So, there is the problem: the first case can meet the corrupted tlb caused by another case which write-protects pages but without flush tlb immediately. In order to making the first case be aware this problem we let it flush tlb if we try to write-protect a spte whose SPTE_MMU_WRITEABLE bit is set, it works since another case never touches SPTE_MMU_WRITEABLE bit. Anyway, whenever a spte is updated (only permission and status bits are changed) we need to check whether the spte with SPTE_MMU_WRITEABLE becomes readonly, if that happens, we need to flush tlb. Fortunately, mmu_spte_update() has already handled it perfectly. The rules to use SPTE_MMU_WRITEABLE and PT_WRITABLE_MASK: - if we want to see if it has writable tlb entry or if the spte can be writable on the mmu mapping, check SPTE_MMU_WRITEABLE, this is the most case, otherwise - if we fix page fault on the spte or do write-protection by dirty logging, check PT_WRITABLE_MASK. TODO: introduce APIs to split these two cases. > > The remaining patches look good. Thank you. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html