Re: [patch 4/5] KVM: MMU: reload request from GET_DIRTY_LOG path

Gleb Natapov <gleb@xxxxxxxxxxxxx> · Tue, 1 Jul 2014 09:27:19 +0300



On Mon, Jun 30, 2014 at 05:59:02PM -0300, Marcelo Tosatti wrote:
> On Fri, Jun 20, 2014 at 01:46:10PM +0300, Gleb Natapov wrote:
> > On Thu, Jun 19, 2014 at 03:40:31PM -0300, Marcelo Tosatti wrote:
> > > On Thu, Jun 19, 2014 at 11:17:19AM +0300, Gleb Natapov wrote:
> > > > On Wed, Jun 18, 2014 at 08:12:07PM -0300, mtosatti@xxxxxxxxxx wrote:
> > > > > Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before
> > > > > deleting a pinned spte.
> > > > > 
> > > > > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
> > > > > 
> > > > > ---
> > > > >  arch/x86/kvm/mmu.c |    3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > > 
> > > > > Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c
> > > > > ===================================================================
> > > > > --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c	2014-06-13 16:50:50.040140594 -0300
> > > > > +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c	2014-06-13 16:51:05.620104451 -0300
> > > > > @@ -1247,6 +1247,9 @@
> > > > >  		spte &= ~SPTE_MMU_WRITEABLE;
> > > > >  	spte = spte & ~PT_WRITABLE_MASK;
> > > > >  
> > > > > +	if (is_pinned_spte(spte))
> > > > > +		mmu_reload_pinned_vcpus(kvm);
> > > > > +
> > > > Why write protect it at all? mmu_reload_pinned_vcpus() will unprotected it anyway
> > > > on the next vmentry. Isn't it better to just report all pinned pages as dirty alway.
> > > 
> > > That was the initial plan, however its awkward to stop vcpus, execute
> > > get_dirty_log twice, and have pages marked as dirty on the second
> > > execution.
> > Indeed, but I think it may happen today with vhost (or even other devices
> > that emulate dma), so userspace should be able to deal with it already.
> > 
> > > 
> > > That is, it is in "incorrect" to report pages as dirty when they are
> > > clean.
> > In case of PEBS area we cannot know. CPU writes there directly without
> > KVM even knowing about it so the only sane thing is to assume that after
> > a vmentry PEBS area is dirty. This is what happens with this patch BTW,
> > mmu_reload_pinned_vcpus() will force ->page_fault(FERR_WRITE) which will
> > mark all pinned pages as dirty even if pages are never written to. You
> > can achieve the same by having vcpu->pinned_page_dirty which is set to
> > true on each vmentry and to false on each GET_DIRTY_LOG.
> > 
> > > 
> > > Moreover, if the number of pinned pages is larger than the dirty
> > > threshold to stop VM and migrate, you'll never migrate. If vcpus are
> > > in HLT and don't VM-enter immediately, the pages should not be refaulted
> > > right away.
> > We should not allow that many pinned pages for security reasons. And
> > having a lot of page to fault in on a next vmentry after each
> > GET_DIRTY_LOG will slow down a guest during migration considerably.
> > 
> > > 
> > > Do you think the optimization is worthwhile ?
> > > 
> > I think it's simpler, but even if we will go with your current approach
> > it should be improved: there is no point sending IPI to all vcpus in
> > spte_write_protect() like you do here since the IPI will be send anyway at
> > the end of write protect because of ptes writable->nonwritable transition,
> > so all you have to do here is to set KVM_REQ_MMU_RELOAD, not need for IPI.
> 
> No, you have to make sure the vcpu is out of guest mode.
> 
> > In fact this makes me thinking that write protecting pinned page here is
> > incorrect because old translation may not be in TLB and if CPU will try to
> > write PEBS entry after pte is write protected but before MMU is reloaded
> > it will encounter non writable pte and god knows what will happen, SDM
> > does not tell. So spte_write_protect() should be something like that IMO:
> 
> The vcpu will never see a read-only spte because the VM-exit (due to
> IPI) guarantees vcpu is outside of guest mode _before_ it is write
> protected.
Right. Now I see why you absolutely have to send IPI in mmu_reload_pinned_vcpus()
before marking pte as read only. And kvm->mmu_lock is what will prevent vcpu from
re-entering guest mode again before pte is marked read only, right?

> 
> So i ask you: do you still hold the "current approach should be
> improved" position ?
> 
As I said IMO what I proposed is much simpler and not as tricky as what you have here.
It also has an advantage of not slowing down next guest entry after GET_DIRTY_LOG because
it does not require mmu reload and page_faulting in pinned pages.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html