On Mon, Aug 13, 2012 at 01:34:11PM -0300, Marcelo Tosatti wrote: > On Sat, Aug 11, 2012 at 10:37:54AM +1000, Paul Mackerras wrote: > > On Fri, Aug 10, 2012 at 03:35:53PM -0300, Marcelo Tosatti wrote: > > > There's no plan. I just wanted to confirm this before converting > > > to per-memslot flush. > > > > > > 1) __kvm_set_memory_region line 807: > > > > > > * - gfn_to_hva (kvm_read_guest, gfn_to_pfn) > > > * - kvm_is_visible_gfn (mmu_check_roots) > > > */ > > > kvm_arch_flush_shadow(kvm); > > > kfree(old_memslots); > > > } > > > > > > This can be converted to per-memslot flush. > > > > Yes, that would be good. > > > > > 2) __kvm_set_memory_region line 846: > > > > > > /* > > > * If the new memory slot is created, we need to clear all > > > * mmio sptes. > > > */ > > > if (npages && old.base_gfn != mem->guest_phys_addr >> PAGE_SHIFT) > > > kvm_arch_flush_shadow(kvm); > > > > > > This must flush all translations in the new and old GPA ranges: > > > a) to remove any mmio sptes that existed in the new GPA range > > > (see ce88decffd17bf9f373cc233c for a description). > > > > I assume that mmio sptes are ones that are created for accesses to > > guest physical addresses where there is no memslot. On Book3S HV we > > trap those accesses (on POWER7 at least) but we don't create any > > hardware PTEs for them. So I don't have anything to do here. > > > > > b) to remove any translations in the old range (this is > > > necessary because change of GPA base is allowed). > > > > In fact I need to remove any translations in the old range *before* > > the new memslot gets committed, whereas this happens after that. > > This is why I was doing the flush in kvm_arch_prepare_memory_region. > > I see. The problem with that is, given that the fault path (reader > of memslots) is protected only by SRCU, the following window is open: > > A) kvm_arch_prepare_memory_region > B) rcu_assign_pointer(kvm->memslots, slots) > C) kvm_arch_commit_memory_region > > The window between A and B where a guest pagefault can instantiate a new > translation using the old memslot information (while you already removed > any translations in the old range). > > > Avi, Gleb, Alex, do you know why it is necessary to support change of > GPA base again? > > Without taking into consideration backwards compatibility, userspace > can first delete the slot and later create a new one. Current userspace does not, and can't see why older userspace would. If we break it and someone complains, it can be fixed. So, please: 1) Disallow change of base GPA (can do that in the "Check for overlaps" loop in __kvm_set_memory_region), returning EINVAL. 2) Introduce kvm_arch_invalidate_memslot (replacing first kvm_arch_flush_shadow). 3) Introduce kvm_arch_postcommit_memslot (replacing the second kvm_arch_flush_shadow). > > If the new memslot got committed while there were still some > > translations left in the hardware page table, it's possible that the > > guest could ask to remove one of those hardware PTEs. As part of that > > process, I get the GPA that the HPTE referred to, look up which > > memslot it belongs to, and use the offset from the base_gfn of the > > memslot to index into the memslot's rmap array. If the base_gfn has > > been changed then I will hit the wrong rmap entry, or possibly not > > find the memslot at all, and bad things will happen. > > So, that notification before-commit-of-new-memslot is only necessary for > base change, correct? That is, you don't need that pre notification for > entirely new memslots in a previously unpopulated range. > > > Basically, the rmap array has to be empty before we commit the new > > memslot. > > > > > Alex/Paul, slot creation should be rare (and modification of GPA base > > > should not be used, even though it is supported). But if you really need > > > a new callback, the two points above are what the second call needs (x86 > > > will flush everything). > > > > > > The other two kvm_arch_flush_shadow in kvm_mmu_notifier_release and > > > kvm_destroy_vm must remove all sptes obviously. > > > > Will these be the only remaining call sites for kvm_arch_flush_shadow? > > Yes, should be. OK, lets first get the callbacks in > kvm_set_memory_region right and later you can restrict > kvm_arch_flush_shadow as necessary. > > > I can see an optimization here if no vcpus are running (which should > > be the case) -- I can set a flag to say that the hardware page table > > is completely invalid, and if any vcpu ever does try to run again, > > clear out the hardware page table and flush all TLBs at that point. > > > > Paul. > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html