Re: [RFC] KVM: x86: conditionally acquire/release slots_lock on entry/exit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 28, 2009 at 09:50:36AM +0300, Avi Kivity wrote:
> On 08/28/2009 01:59 AM, Marcelo Tosatti wrote:
>> On Thu, Aug 27, 2009 at 07:27:48PM +0300, Avi Kivity wrote:
>>    
>>> On 08/27/2009 06:54 PM, Marcelo Tosatti wrote:
>>>      
>>>> perf report shows heavy overhead from down/up of slots_lock.
>>>>
>>>> Attempted to remove slots_lock by having vcpus stop on a synchronization
>>>> point, but this introduced further complexity (a vcpu can be scheduled
>>>> out before reaching the synchronization point, and can sched back in at
>>>> points which are slots_lock protected, etc).
>>>>
>>>> This patch changes vcpu_enter_guest to conditionally release/acquire
>>>> slots_lock in case a vcpu state bit is set.
>>>>
>>>> vmexit performance improves by 5-10% on UP guest.
>>>>
>>>>        
>>> Sorry, it looks pretty complex.
>>>      
>> Why?
>>    
>
> There's a new locking protocol in there.  I prefer sticking with the  
> existing kernel plumbing, or it gets more and more difficult knowing who  
> protects what and in what order you can do things.
>
>>> Have you considered using srcu? It seems to me down/up_read() can
>>> be replaced by srcu_read_lock/unlock(), and after proper conversion
>>> of memslots and io_bus to rcu_assign_pointer(), we can just add
>>> synchronize_srcu() immediately after changing stuff (of course
>>> mmu_lock still needs to be held when updating slots).
>>>      
>> I don't see RCU as being suitable because in certain operations you
>> want to stop writers (on behalf of vcpus), do something, and let them
>> continue afterwards. The dirty log, for example. Or any operation that
>> wants to modify lockless vcpu specific data.
>>    
>
> kvm_flush_remote_tlbs() (which you'd call after mmu operations), will  
> get cpus out of guest mode, and synchronize_srcu() will wait for them to  
> drop the srcu "read lock".  So it really happens naturally: do an RCU  
> update, send some request to all vcpus, synchronize_srcu(), done.
>
>> Also, synchronize_srcu() is limited to preemptible sections.
>>
>> io_bus could use RCU, but I think being able to stop vcpus is also a
>> different requirement. Do you have any suggestion on how to do it in a
>> better way?
>>    
>
> We don't need to stop vcpus, just kick them out of guest mode to let  
> them notice the new data.  SRCU does that well.

Two problems:

1. The removal of memslots/aliases and zapping of mmu (to remove any
shadow pages with stale sp->gfn) must be atomic from the POV of the
pagefault path. Otherwise something like this can happen:

fault path					set_memory_region

walk_addr fetches and validates
table_gfns
						delete memslot
						synchronize_srcu

fetch creates shadow
page with nonexistant sp->gfn

OR

mmu_alloc_roots path				set_memory_region
						
						delete memslot
root_gfn = vcpu->arch.cr3 << PAGE_SHIFT
mmu_check_root(root_gfn)			synchronize_rcu
kvm_mmu_get_page()
						kvm_mmu_zap_all

Accesses between kvm_mmu_get_page and kvm_mmu_zap_all window can see
shadow pages with stale gfn.

But, if you still think its worthwhile to use RCU, at least handling
gfn_to_memslot / unalias_gfn errors _and_ adding mmu_notifier_retry
invalidation to set_memory_region path will be necessary (so that
gfn_to_pfn validation, in the fault path, is discarded in case
of memslot/alias update).

2. Another complication is that memslot creation and kvm_iommu_map_pages
are not atomic.

create memslot
synchronize_srcu
				<----- vcpu grabs gfn reference without
				       iommu mapping.
kvm_iommu_map_pages

Which can be solved by changing kvm_iommu_map_pages (and new gfn_to_pfn
helper) to use base_gfn,npages,hva information from somewhere else other
than visible kvm->memslots (so that when the slot becomes visible its
already iommu mapped).

So it appears to me using RCU introduces more complications / subtle
details than mutual exclusion here. The new request bit which the
original patch introduces is limited to enabling/disabling conditional
acquision of slots_lock (calling it a "new locking protocol" is unfair)
to improve write acquision latency.

Better ideas/directions welcome.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux