Re: [PATCH 1/4] KVM: Add APIs for unlocked TLB flush

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 03, 2012 at 05:11:01PM +0300, Avi Kivity wrote:
> On 05/03/2012 04:23 PM, Xiao Guangrong wrote:
> > On 05/03/2012 07:22 PM, Avi Kivity wrote:
> >
> > > Currently we flush the TLB while holding mmu_lock.  This
> > > increases the lock hold time by the IPI round-trip time, increasing
> > > contention, and makes dropping the lock (for latency reasons) harder.
> > > 
> > > This patch changes TLB management to be usable locklessly, introducing
> > > the following APIs:
> > > 
> > >   kvm_mark_tlb_dirty() - mark the TLB as containing stale entries
> > >   kvm_cond_flush_remote_tlbs() - flush the TLB if it was marked as
> > >                                   dirty
> > > 
> > > These APIs can be used without holding mmu_lock (though if the TLB
> > > became stale due to shadow page table modifications, typically it
> > > will need to be called with the lock held to prevent other threads
> > > from seeing the modified page tables with the TLB unmarked and unflushed)/
> > > 
> > > Signed-off-by: Avi Kivity <avi@xxxxxxxxxx>
> > > ---
> > >  Documentation/virtual/kvm/locking.txt |   14 ++++++++++++++
> > >  arch/x86/kvm/paging_tmpl.h            |    4 ++--
> > >  include/linux/kvm_host.h              |   22 +++++++++++++++++++++-
> > >  virt/kvm/kvm_main.c                   |   29 ++++++++++++++++++++---------
> > >  4 files changed, 57 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/Documentation/virtual/kvm/locking.txt b/Documentation/virtual/kvm/locking.txt
> > > index 3b4cd3b..f6c90479 100644
> > > --- a/Documentation/virtual/kvm/locking.txt
> > > +++ b/Documentation/virtual/kvm/locking.txt
> > > @@ -23,3 +23,17 @@ Arch:		x86
> > >  Protects:	- kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
> > >  		- tsc offset in vmcb
> > >  Comment:	'raw' because updating the tsc offsets must not be preempted.
> > > +
> > > +3. TLB control
> > > +--------------
> > > +
> > > +The following APIs should be used for TLB control:
> > > +
> > > + - kvm_mark_tlb_dirty() - indicates that the TLB is out of sync wrt
> > > +      either guest or host page tables.
> > > + - kvm_flush_remote_tlbs() - unconditionally flush the tlbs
> > > + - kvm_cond_flush_remote_tlbs() - flush the TLBs if previously marked
> > > +
> > > +These may be used without mmu_lock, though kvm_mark_tlb_dirty() needs to be
> > > +used while holding mmu_lock if it is called due to host page table changes
> > > +(contrast to guest page table changes).
> >
> >
> > In these patches, it seems that kvm_mark_tlb_dirty is always used
> > under the protection of mmu-lock, yes?
> 
> Correct.  It's possible we'll find a use outside mmu_lock, but this
> isn't likely.
> 
> > If both kvm_mark_tlb_dirty and kvm_cond_flush_remote_tlbs are use
> > out of mmu-lock, i think we can use kvm_flush_remote_tlbs instead.
> >
> > If it is so, dirtied_count/flushed_count need not be atomic.
> 
> But we always mark with mmu_lock held.
> 
> >
> > And, it seems there is a bug:
> >
> >  VCPU 0                              VCPU 1
> >
> >  hold mmu-lock
> >  zap spte which points to 'gfn'
> >  mark_tlb_dirty
> >  release mmu-lock
> >                                     hold mmu-lock
> >                                     rmap_write-protect:
> >                                        see no spte pointing to gfn
> >                                        tlb did not be flushed
> >                                     release mmu-lock
> >
> >                                     write gfn by guest
> >                                       OOPS!!!
> >
> >  kvm_cond_flush_remote_tlbs
> >
> > We need call kvm_cond_flush_remote_tlbs in rmap_write-protect
> > unconditionally?
> 
> Yes, that's the point.  We change
> 
>    spin_lock(mmu_lock)
>    conditionally flush the tlb
>    spin_unlock(mmu_lock)
> 
> to
> 
>   spin_lock(mmu_lock)
>   conditionally mark the tlb as dirty
>   spin_unlock(mmu_lock)
>   kvm_cond_flush_remote_tlbs()
> 
> but that means the entire codebase has to be converted.

Is there any other site which expects sptes and TLBs to be in sync,     
other than rmap_write_protect?                                          

Please convert the flush_remote_tlbs at the end of set_spte (RW->RO) to
mark_dirty.

Looks good in general (patchset is incomplete though). One thing that
is annoying is that there is no guarantee of progress for flushed_count
increment (it can, in theory, always race with a mark_dirty). But since
that would be only a performance, and not correctness, aspect, it is
fine.

It has the advantage that it requires code to explicitly document where
the TLB must be flushed and the sites which expect sptes to be in sync
with TLBs.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux