On Fri, Aug 28, 2009 at 09:50:36AM +0300, Avi Kivity wrote: > On 08/28/2009 01:59 AM, Marcelo Tosatti wrote: >> On Thu, Aug 27, 2009 at 07:27:48PM +0300, Avi Kivity wrote: >> >>> On 08/27/2009 06:54 PM, Marcelo Tosatti wrote: >>> >>>> perf report shows heavy overhead from down/up of slots_lock. >>>> >>>> Attempted to remove slots_lock by having vcpus stop on a synchronization >>>> point, but this introduced further complexity (a vcpu can be scheduled >>>> out before reaching the synchronization point, and can sched back in at >>>> points which are slots_lock protected, etc). >>>> >>>> This patch changes vcpu_enter_guest to conditionally release/acquire >>>> slots_lock in case a vcpu state bit is set. >>>> >>>> vmexit performance improves by 5-10% on UP guest. >>>> >>>> >>> Sorry, it looks pretty complex. >>> >> Why? >> > > There's a new locking protocol in there. I prefer sticking with the > existing kernel plumbing, or it gets more and more difficult knowing who > protects what and in what order you can do things. > >>> Have you considered using srcu? It seems to me down/up_read() can >>> be replaced by srcu_read_lock/unlock(), and after proper conversion >>> of memslots and io_bus to rcu_assign_pointer(), we can just add >>> synchronize_srcu() immediately after changing stuff (of course >>> mmu_lock still needs to be held when updating slots). >>> >> I don't see RCU as being suitable because in certain operations you >> want to stop writers (on behalf of vcpus), do something, and let them >> continue afterwards. The dirty log, for example. Or any operation that >> wants to modify lockless vcpu specific data. >> > > kvm_flush_remote_tlbs() (which you'd call after mmu operations), will > get cpus out of guest mode, and synchronize_srcu() will wait for them to > drop the srcu "read lock". So it really happens naturally: do an RCU > update, send some request to all vcpus, synchronize_srcu(), done. > >> Also, synchronize_srcu() is limited to preemptible sections. >> >> io_bus could use RCU, but I think being able to stop vcpus is also a >> different requirement. Do you have any suggestion on how to do it in a >> better way? >> > > We don't need to stop vcpus, just kick them out of guest mode to let > them notice the new data. SRCU does that well. Two problems: 1. The removal of memslots/aliases and zapping of mmu (to remove any shadow pages with stale sp->gfn) must be atomic from the POV of the pagefault path. Otherwise something like this can happen: fault path set_memory_region walk_addr fetches and validates table_gfns delete memslot synchronize_srcu fetch creates shadow page with nonexistant sp->gfn OR mmu_alloc_roots path set_memory_region delete memslot root_gfn = vcpu->arch.cr3 << PAGE_SHIFT mmu_check_root(root_gfn) synchronize_rcu kvm_mmu_get_page() kvm_mmu_zap_all Accesses between kvm_mmu_get_page and kvm_mmu_zap_all window can see shadow pages with stale gfn. But, if you still think its worthwhile to use RCU, at least handling gfn_to_memslot / unalias_gfn errors _and_ adding mmu_notifier_retry invalidation to set_memory_region path will be necessary (so that gfn_to_pfn validation, in the fault path, is discarded in case of memslot/alias update). 2. Another complication is that memslot creation and kvm_iommu_map_pages are not atomic. create memslot synchronize_srcu <----- vcpu grabs gfn reference without iommu mapping. kvm_iommu_map_pages Which can be solved by changing kvm_iommu_map_pages (and new gfn_to_pfn helper) to use base_gfn,npages,hva information from somewhere else other than visible kvm->memslots (so that when the slot becomes visible its already iommu mapped). So it appears to me using RCU introduces more complications / subtle details than mutual exclusion here. The new request bit which the original patch introduces is limited to enabling/disabling conditional acquision of slots_lock (calling it a "new locking protocol" is unfair) to improve write acquision latency. Better ideas/directions welcome. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html