On Wed, Aug 15, 2012 at 12:26:05PM +0300, Avi Kivity wrote: > On 08/14/2012 01:04 AM, Marcelo Tosatti wrote: > > On Mon, Aug 13, 2012 at 01:34:11PM -0300, Marcelo Tosatti wrote: > >> On Sat, Aug 11, 2012 at 10:37:54AM +1000, Paul Mackerras wrote: > >> > On Fri, Aug 10, 2012 at 03:35:53PM -0300, Marcelo Tosatti wrote: > >> > > There's no plan. I just wanted to confirm this before converting > >> > > to per-memslot flush. > >> > > > >> > > 1) __kvm_set_memory_region line 807: > >> > > > >> > > * - gfn_to_hva (kvm_read_guest, gfn_to_pfn) > >> > > * - kvm_is_visible_gfn (mmu_check_roots) > >> > > */ > >> > > kvm_arch_flush_shadow(kvm); > >> > > kfree(old_memslots); > >> > > } > >> > > > >> > > This can be converted to per-memslot flush. > >> > > >> > Yes, that would be good. > >> > > >> > > 2) __kvm_set_memory_region line 846: > >> > > > >> > > /* > >> > > * If the new memory slot is created, we need to clear all > >> > > * mmio sptes. > >> > > */ > >> > > if (npages && old.base_gfn != mem->guest_phys_addr >> PAGE_SHIFT) > >> > > kvm_arch_flush_shadow(kvm); > >> > > > >> > > This must flush all translations in the new and old GPA ranges: > >> > > a) to remove any mmio sptes that existed in the new GPA range > >> > > (see ce88decffd17bf9f373cc233c for a description). > >> > > >> > I assume that mmio sptes are ones that are created for accesses to > >> > guest physical addresses where there is no memslot. On Book3S HV we > >> > trap those accesses (on POWER7 at least) but we don't create any > >> > hardware PTEs for them. So I don't have anything to do here. > >> > > >> > > b) to remove any translations in the old range (this is > >> > > necessary because change of GPA base is allowed). > >> > > >> > In fact I need to remove any translations in the old range *before* > >> > the new memslot gets committed, whereas this happens after that. > >> > This is why I was doing the flush in kvm_arch_prepare_memory_region. > >> > >> I see. The problem with that is, given that the fault path (reader > >> of memslots) is protected only by SRCU, the following window is open: > >> > >> A) kvm_arch_prepare_memory_region > >> B) rcu_assign_pointer(kvm->memslots, slots) > >> C) kvm_arch_commit_memory_region > >> > >> The window between A and B where a guest pagefault can instantiate a new > >> translation using the old memslot information (while you already removed > >> any translations in the old range). > >> > >> > >> Avi, Gleb, Alex, do you know why it is necessary to support change of > >> GPA base again? > >> > >> Without taking into consideration backwards compatibility, userspace > >> can first delete the slot and later create a new one. > > > > Current userspace does not, and can't see why older userspace would. If > > we break it and someone complains, it can be fixed. > > > > So, please: > > > > 1) Disallow change of base GPA (can do that in the "Check for overlaps" loop > > in __kvm_set_memory_region), returning EINVAL. > > 2) Introduce kvm_arch_invalidate_memslot (replacing first > > kvm_arch_flush_shadow). > > 3) Introduce kvm_arch_postcommit_memslot (replacing the > > second kvm_arch_flush_shadow). > > > Is there a reason why the kernel can't do the same? First remove the > old slot, then add the new one? > > We break atomicity, but it's less likely that guests rely on it than an > old qemu not relying on gpa-changing memory update. The guest should not expect memory accesses to an address to behave sanely while changing a BAR anyway. Yes, compatibility for change of GPA base can be done in the kernel. I can look into it next week if nobody has done so at that point. Thanks Avi. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html