On Sun, Sep 13, 2009 at 09:26:52AM -0700, Paul E. McKenney wrote: > On Sun, Sep 13, 2009 at 06:42:49PM +0300, Avi Kivity wrote: > > On 09/11/2009 01:30 AM, Marcelo Tosatti wrote: > >> > >>> We don't need to stop vcpus, just kick them out of guest mode to let > >>> them notice the new data. SRCU does that well. > >>> > >> Two problems: > >> > >> 1. The removal of memslots/aliases and zapping of mmu (to remove any > >> shadow pages with stale sp->gfn) must be atomic from the POV of the > >> pagefault path. Otherwise something like this can happen: > >> > >> fault path set_memory_region > >> > > > > srcu_read_lock() > > > >> walk_addr fetches and validates > >> table_gfns > >> delete memslot > >> synchronize_srcu > >> > >> fetch creates shadow > >> > > > > srcu_read_unlock() > > > >> page with nonexistant sp->gfn > >> > > > > I think synchronize_srcu() will be deferred until the fault path is > > complete (and srcu_read_unlock() runs). Copying someone who knows for > > sure. > > Yes, synchronize_srcu() will block until srcu_read_unlock() in this > scenario, assuming that the same srcu_struct is used by both. Right it will. But this does not stop the fault path from creating shadow pages with stale sp->gfn (the only way to do that would be mutual exclusion AFAICS). > >> OR > >> > >> mmu_alloc_roots path set_memory_region > >> > > > > srcu_read_lock() > > > >> > >> delete memslot > >> root_gfn = vcpu->arch.cr3<< PAGE_SHIFT > >> mmu_check_root(root_gfn) synchronize_rcu > >> kvm_mmu_get_page() > >> > > > > srcu_read_unlock() > > > >> kvm_mmu_zap_all > >> > > > > Ditto, srcu_read_lock() protects us. > > Yep! The RCU read-protected side does not stop a new memslots pointer from being assigned (with rcu_assign_pointer), does it? > >> Accesses between kvm_mmu_get_page and kvm_mmu_zap_all window can see > >> shadow pages with stale gfn. > >> > >> But, if you still think its worthwhile to use RCU, at least handling > >> gfn_to_memslot / unalias_gfn errors _and_ adding mmu_notifier_retry > >> invalidation to set_memory_region path will be necessary (so that > >> gfn_to_pfn validation, in the fault path, is discarded in case > >> of memslot/alias update). > > > > It really is worthwhile to reuse complex infrastructure instead of writing > > new infrastructure. > > Marcelo, in your first example, is your concern that the fault path > needs to detect the memslot deletion? Yes, it needs to invalidate the leakage, which in this case is a shadow page data structure which was created containing information from a now deleted memslot. > Or that the use of sp->gfn "leaks" out of the SRCU read-side critical > section? Yes, use of a stale sp->gfn leaks outside of the SRCU read side critical section and currently the rest of the code is not ready to deal with that... but it will have to. > Thanx, Paul > > >> 2. Another complication is that memslot creation and kvm_iommu_map_pages > >> are not atomic. > >> > >> create memslot > >> synchronize_srcu > >> <----- vcpu grabs gfn reference without > >> iommu mapping. > >> kvm_iommu_map_pages > >> > >> Which can be solved by changing kvm_iommu_map_pages (and new gfn_to_pfn > >> helper) to use base_gfn,npages,hva information from somewhere else other > >> than visible kvm->memslots (so that when the slot becomes visible its > >> already iommu mapped). > > > > Yes. It can accept a memslots structure instead of deriving it from > > kvm->memslots. Then we do a rcu_assign_pointer() to switch the tables. Alright. > >> So it appears to me using RCU introduces more complications / subtle > >> details than mutual exclusion here. The new request bit which the > >> original patch introduces is limited to enabling/disabling conditional > >> acquision of slots_lock (calling it a "new locking protocol" is unfair) > >> to improve write acquision latency. > >> > > > > It's true that it is not a new locking protocol. But I feel it is > > worthwhile to try to use rcu for this; at least it will make it easier for > > newcomers (provided they understand rcu). Sure. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html