On Fri, Jun 20, 2014 at 01:46:10PM +0300, Gleb Natapov wrote: > On Thu, Jun 19, 2014 at 03:40:31PM -0300, Marcelo Tosatti wrote: > > On Thu, Jun 19, 2014 at 11:17:19AM +0300, Gleb Natapov wrote: > > > On Wed, Jun 18, 2014 at 08:12:07PM -0300, mtosatti@xxxxxxxxxx wrote: > > > > Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before > > > > deleting a pinned spte. > > > > > > > > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx> > > > > > > > > --- > > > > arch/x86/kvm/mmu.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c > > > > =================================================================== > > > > --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c 2014-06-13 16:50:50.040140594 -0300 > > > > +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-06-13 16:51:05.620104451 -0300 > > > > @@ -1247,6 +1247,9 @@ > > > > spte &= ~SPTE_MMU_WRITEABLE; > > > > spte = spte & ~PT_WRITABLE_MASK; > > > > > > > > + if (is_pinned_spte(spte)) > > > > + mmu_reload_pinned_vcpus(kvm); > > > > + > > > Why write protect it at all? mmu_reload_pinned_vcpus() will unprotected it anyway > > > on the next vmentry. Isn't it better to just report all pinned pages as dirty alway. > > > > That was the initial plan, however its awkward to stop vcpus, execute > > get_dirty_log twice, and have pages marked as dirty on the second > > execution. > Indeed, but I think it may happen today with vhost (or even other devices > that emulate dma), so userspace should be able to deal with it already. > > > > > That is, it is in "incorrect" to report pages as dirty when they are > > clean. > In case of PEBS area we cannot know. CPU writes there directly without > KVM even knowing about it so the only sane thing is to assume that after > a vmentry PEBS area is dirty. This is what happens with this patch BTW, > mmu_reload_pinned_vcpus() will force ->page_fault(FERR_WRITE) which will > mark all pinned pages as dirty even if pages are never written to. You > can achieve the same by having vcpu->pinned_page_dirty which is set to > true on each vmentry and to false on each GET_DIRTY_LOG. > > > > > Moreover, if the number of pinned pages is larger than the dirty > > threshold to stop VM and migrate, you'll never migrate. If vcpus are > > in HLT and don't VM-enter immediately, the pages should not be refaulted > > right away. > We should not allow that many pinned pages for security reasons. And > having a lot of page to fault in on a next vmentry after each > GET_DIRTY_LOG will slow down a guest during migration considerably. > > > > > Do you think the optimization is worthwhile ? > > > I think it's simpler, but even if we will go with your current approach > it should be improved: there is no point sending IPI to all vcpus in > spte_write_protect() like you do here since the IPI will be send anyway at > the end of write protect because of ptes writable->nonwritable transition, > so all you have to do here is to set KVM_REQ_MMU_RELOAD, not need for IPI. No, you have to make sure the vcpu is out of guest mode. > In fact this makes me thinking that write protecting pinned page here is > incorrect because old translation may not be in TLB and if CPU will try to > write PEBS entry after pte is write protected but before MMU is reloaded > it will encounter non writable pte and god knows what will happen, SDM > does not tell. So spte_write_protect() should be something like that IMO: The vcpu will never see a read-only spte because the VM-exit (due to IPI) guarantees vcpu is outside of guest mode _before_ it is write protected. So i ask you: do you still hold the "current approach should be improved" position ? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html