Re: [PATCH 3/5] KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 13, 2012 at 01:34:11PM -0300, Marcelo Tosatti wrote:
> On Sat, Aug 11, 2012 at 10:37:54AM +1000, Paul Mackerras wrote:
> > On Fri, Aug 10, 2012 at 03:35:53PM -0300, Marcelo Tosatti wrote:
> > > There's no plan. I just wanted to confirm this before converting 
> > > to per-memslot flush.
> > > 
> > > 1) __kvm_set_memory_region line 807:
> > > 
> > >                  *      - gfn_to_hva (kvm_read_guest, gfn_to_pfn)
> > >                  *      - kvm_is_visible_gfn (mmu_check_roots)
> > >                  */
> > >                 kvm_arch_flush_shadow(kvm);
> > >                 kfree(old_memslots);
> > >         }
> > > 
> > > This can be converted to per-memslot flush.
> > 
> > Yes, that would be good.
> > 
> > > 2) __kvm_set_memory_region line 846:
> > > 
> > >         /*
> > >          * If the new memory slot is created, we need to clear all
> > >          * mmio sptes.
> > >          */
> > >         if (npages && old.base_gfn != mem->guest_phys_addr >> PAGE_SHIFT)
> > >                 kvm_arch_flush_shadow(kvm);
> > > 
> > > This must flush all translations in the new and old GPA ranges:
> > > 	a) to remove any mmio sptes that existed in the new GPA range
> > > 	   (see ce88decffd17bf9f373cc233c for a description).
> > 
> > I assume that mmio sptes are ones that are created for accesses to
> > guest physical addresses where there is no memslot.  On Book3S HV we
> > trap those accesses (on POWER7 at least) but we don't create any
> > hardware PTEs for them.  So I don't have anything to do here.
> > 
> > > 	b) to remove any translations in the old range (this is
> > >            necessary because change of GPA base is allowed).
> > 
> > In fact I need to remove any translations in the old range *before*
> > the new memslot gets committed, whereas this happens after that.
> > This is why I was doing the flush in kvm_arch_prepare_memory_region.
> 
> I see. The problem with that is, given that the fault path (reader 
> of memslots) is protected only by SRCU, the following window is open:
> 
> A) kvm_arch_prepare_memory_region
> B) rcu_assign_pointer(kvm->memslots, slots)
> C) kvm_arch_commit_memory_region
> 
> The window between A and B where a guest pagefault can instantiate a new
> translation using the old memslot information (while you already removed
> any translations in the old range).
> 
> 
> Avi, Gleb, Alex, do you know why it is necessary to support change of
> GPA base again?
> 
> Without taking into consideration backwards compatibility, userspace 
> can first delete the slot and later create a new one.

Current userspace does not, and can't see why older userspace would. If
we break it and someone complains, it can be fixed. 

So, please:

1) Disallow change of base GPA (can do that in the "Check for overlaps" loop
in __kvm_set_memory_region), returning EINVAL.
2) Introduce kvm_arch_invalidate_memslot (replacing first
kvm_arch_flush_shadow).
3) Introduce kvm_arch_postcommit_memslot (replacing the 
second kvm_arch_flush_shadow).

> > If the new memslot got committed while there were still some
> > translations left in the hardware page table, it's possible that the
> > guest could ask to remove one of those hardware PTEs.  As part of that
> > process, I get the GPA that the HPTE referred to, look up which
> > memslot it belongs to, and use the offset from the base_gfn of the
> > memslot to index into the memslot's rmap array.  If the base_gfn has
> > been changed then I will hit the wrong rmap entry, or possibly not
> > find the memslot at all, and bad things will happen.
> 
> So, that notification before-commit-of-new-memslot is only necessary for
> base change, correct? That is, you don't need that pre notification for
> entirely new memslots in a previously unpopulated range.
> 
> > Basically, the rmap array has to be empty before we commit the new
> > memslot.
> > 
> > > Alex/Paul, slot creation should be rare (and modification of GPA base
> > > should not be used, even though it is supported). But if you really need
> > > a new callback, the two points above are what the second call needs (x86
> > > will flush everything).
> > > 
> > > The other two kvm_arch_flush_shadow in kvm_mmu_notifier_release and
> > > kvm_destroy_vm must remove all sptes obviously.
> > 
> > Will these be the only remaining call sites for kvm_arch_flush_shadow?
> 
> Yes, should be. OK, lets first get the callbacks in
> kvm_set_memory_region right and later you can restrict
> kvm_arch_flush_shadow as necessary.
> 
> > I can see an optimization here if no vcpus are running (which should
> > be the case) -- I can set a flag to say that the hardware page table
> > is completely invalid, and if any vcpu ever does try to run again,
> > clear out the hardware page table and flush all TLBs at that point.
> > 
> > Paul.
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux