Re: [PATCH 3/5] KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Aug 11, 2012 at 10:37:54AM +1000, Paul Mackerras wrote:
> On Fri, Aug 10, 2012 at 03:35:53PM -0300, Marcelo Tosatti wrote:
> > There's no plan. I just wanted to confirm this before converting 
> > to per-memslot flush.
> > 
> > 1) __kvm_set_memory_region line 807:
> > 
> >                  *      - gfn_to_hva (kvm_read_guest, gfn_to_pfn)
> >                  *      - kvm_is_visible_gfn (mmu_check_roots)
> >                  */
> >                 kvm_arch_flush_shadow(kvm);
> >                 kfree(old_memslots);
> >         }
> > 
> > This can be converted to per-memslot flush.
> 
> Yes, that would be good.
> 
> > 2) __kvm_set_memory_region line 846:
> > 
> >         /*
> >          * If the new memory slot is created, we need to clear all
> >          * mmio sptes.
> >          */
> >         if (npages && old.base_gfn != mem->guest_phys_addr >> PAGE_SHIFT)
> >                 kvm_arch_flush_shadow(kvm);
> > 
> > This must flush all translations in the new and old GPA ranges:
> > 	a) to remove any mmio sptes that existed in the new GPA range
> > 	   (see ce88decffd17bf9f373cc233c for a description).
> 
> I assume that mmio sptes are ones that are created for accesses to
> guest physical addresses where there is no memslot.  On Book3S HV we
> trap those accesses (on POWER7 at least) but we don't create any
> hardware PTEs for them.  So I don't have anything to do here.
> 
> > 	b) to remove any translations in the old range (this is
> >            necessary because change of GPA base is allowed).
> 
> In fact I need to remove any translations in the old range *before*
> the new memslot gets committed, whereas this happens after that.
> This is why I was doing the flush in kvm_arch_prepare_memory_region.

I see. The problem with that is, given that the fault path (reader 
of memslots) is protected only by SRCU, the following window is open:

A) kvm_arch_prepare_memory_region
B) rcu_assign_pointer(kvm->memslots, slots)
C) kvm_arch_commit_memory_region

The window between A and B where a guest pagefault can instantiate a new
translation using the old memslot information (while you already removed
any translations in the old range).


Avi, Gleb, Alex, do you know why it is necessary to support change of
GPA base again? 

Without taking into consideration backwards compatibility, userspace 
can first delete the slot and later create a new one.

> If the new memslot got committed while there were still some
> translations left in the hardware page table, it's possible that the
> guest could ask to remove one of those hardware PTEs.  As part of that
> process, I get the GPA that the HPTE referred to, look up which
> memslot it belongs to, and use the offset from the base_gfn of the
> memslot to index into the memslot's rmap array.  If the base_gfn has
> been changed then I will hit the wrong rmap entry, or possibly not
> find the memslot at all, and bad things will happen.

So, that notification before-commit-of-new-memslot is only necessary for
base change, correct? That is, you don't need that pre notification for
entirely new memslots in a previously unpopulated range.

> Basically, the rmap array has to be empty before we commit the new
> memslot.
> 
> > Alex/Paul, slot creation should be rare (and modification of GPA base
> > should not be used, even though it is supported). But if you really need
> > a new callback, the two points above are what the second call needs (x86
> > will flush everything).
> > 
> > The other two kvm_arch_flush_shadow in kvm_mmu_notifier_release and
> > kvm_destroy_vm must remove all sptes obviously.
> 
> Will these be the only remaining call sites for kvm_arch_flush_shadow?

Yes, should be. OK, lets first get the callbacks in
kvm_set_memory_region right and later you can restrict
kvm_arch_flush_shadow as necessary.

> I can see an optimization here if no vcpus are running (which should
> be the case) -- I can set a flag to say that the hardware page table
> is completely invalid, and if any vcpu ever does try to run again,
> clear out the hardware page table and flush all TLBs at that point.
> 
> Paul.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux