Re: [patch 4/5] KVM: MMU: reload request from GET_DIRTY_LOG path

Marcelo Tosatti <mtosatti@xxxxxxxxxx> · Mon, 30 Jun 2014 17:59:02 -0300

On Fri, Jun 20, 2014 at 01:46:10PM +0300, Gleb Natapov wrote:
> On Thu, Jun 19, 2014 at 03:40:31PM -0300, Marcelo Tosatti wrote:
> > On Thu, Jun 19, 2014 at 11:17:19AM +0300, Gleb Natapov wrote:
> > > On Wed, Jun 18, 2014 at 08:12:07PM -0300, mtosatti@xxxxxxxxxx wrote:
> > > > Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before
> > > > deleting a pinned spte.
> > > > 
> > > > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
> > > > 
> > > > ---
> > > >  arch/x86/kvm/mmu.c |    3 +++
> > > >  1 file changed, 3 insertions(+)
> > > > 
> > > > Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c
> > > > ===================================================================
> > > > --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c	2014-06-13 16:50:50.040140594 -0300
> > > > +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c	2014-06-13 16:51:05.620104451 -0300
> > > > @@ -1247,6 +1247,9 @@
> > > >  		spte &= ~SPTE_MMU_WRITEABLE;
> > > >  	spte = spte & ~PT_WRITABLE_MASK;
> > > >  
> > > > +	if (is_pinned_spte(spte))
> > > > +		mmu_reload_pinned_vcpus(kvm);
> > > > +
> > > Why write protect it at all? mmu_reload_pinned_vcpus() will unprotected it anyway
> > > on the next vmentry. Isn't it better to just report all pinned pages as dirty alway.
> > 
> > That was the initial plan, however its awkward to stop vcpus, execute
> > get_dirty_log twice, and have pages marked as dirty on the second
> > execution.
> Indeed, but I think it may happen today with vhost (or even other devices
> that emulate dma), so userspace should be able to deal with it already.
> 
> > 
> > That is, it is in "incorrect" to report pages as dirty when they are
> > clean.
> In case of PEBS area we cannot know. CPU writes there directly without
> KVM even knowing about it so the only sane thing is to assume that after
> a vmentry PEBS area is dirty. This is what happens with this patch BTW,
> mmu_reload_pinned_vcpus() will force ->page_fault(FERR_WRITE) which will
> mark all pinned pages as dirty even if pages are never written to. You
> can achieve the same by having vcpu->pinned_page_dirty which is set to
> true on each vmentry and to false on each GET_DIRTY_LOG.
> 
> > 
> > Moreover, if the number of pinned pages is larger than the dirty
> > threshold to stop VM and migrate, you'll never migrate. If vcpus are
> > in HLT and don't VM-enter immediately, the pages should not be refaulted
> > right away.
> We should not allow that many pinned pages for security reasons. And
> having a lot of page to fault in on a next vmentry after each
> GET_DIRTY_LOG will slow down a guest during migration considerably.
> 
> > 
> > Do you think the optimization is worthwhile ?
> > 
> I think it's simpler, but even if we will go with your current approach
> it should be improved: there is no point sending IPI to all vcpus in
> spte_write_protect() like you do here since the IPI will be send anyway at
> the end of write protect because of ptes writable->nonwritable transition,
> so all you have to do here is to set KVM_REQ_MMU_RELOAD, not need for IPI.

No, you have to make sure the vcpu is out of guest mode.

> In fact this makes me thinking that write protecting pinned page here is
> incorrect because old translation may not be in TLB and if CPU will try to
> write PEBS entry after pte is write protected but before MMU is reloaded
> it will encounter non writable pte and god knows what will happen, SDM
> does not tell. So spte_write_protect() should be something like that IMO:

The vcpu will never see a read-only spte because the VM-exit (due to
IPI) guarantees vcpu is outside of guest mode _before_ it is write
protected.

So i ask you: do you still hold the "current approach should be
improved" position ?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html