Re: [PATCH 3/5] KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly

Paul Mackerras <paulus@xxxxxxxxx> · Sat, 11 Aug 2012 10:37:54 +1000

On Fri, Aug 10, 2012 at 03:35:53PM -0300, Marcelo Tosatti wrote:
> There's no plan. I just wanted to confirm this before converting 
> to per-memslot flush.
> 
> 1) __kvm_set_memory_region line 807:
> 
>                  *      - gfn_to_hva (kvm_read_guest, gfn_to_pfn)
>                  *      - kvm_is_visible_gfn (mmu_check_roots)
>                  */
>                 kvm_arch_flush_shadow(kvm);
>                 kfree(old_memslots);
>         }
> 
> This can be converted to per-memslot flush.

Yes, that would be good.

> 2) __kvm_set_memory_region line 846:
> 
>         /*
>          * If the new memory slot is created, we need to clear all
>          * mmio sptes.
>          */
>         if (npages && old.base_gfn != mem->guest_phys_addr >> PAGE_SHIFT)
>                 kvm_arch_flush_shadow(kvm);
> 
> This must flush all translations in the new and old GPA ranges:
> 	a) to remove any mmio sptes that existed in the new GPA range
> 	   (see ce88decffd17bf9f373cc233c for a description).

I assume that mmio sptes are ones that are created for accesses to
guest physical addresses where there is no memslot.  On Book3S HV we
trap those accesses (on POWER7 at least) but we don't create any
hardware PTEs for them.  So I don't have anything to do here.

> 	b) to remove any translations in the old range (this is
>            necessary because change of GPA base is allowed).

In fact I need to remove any translations in the old range *before*
the new memslot gets committed, whereas this happens after that.
This is why I was doing the flush in kvm_arch_prepare_memory_region.

If the new memslot got committed while there were still some
translations left in the hardware page table, it's possible that the
guest could ask to remove one of those hardware PTEs.  As part of that
process, I get the GPA that the HPTE referred to, look up which
memslot it belongs to, and use the offset from the base_gfn of the
memslot to index into the memslot's rmap array.  If the base_gfn has
been changed then I will hit the wrong rmap entry, or possibly not
find the memslot at all, and bad things will happen.

Basically, the rmap array has to be empty before we commit the new
memslot.

> Alex/Paul, slot creation should be rare (and modification of GPA base
> should not be used, even though it is supported). But if you really need
> a new callback, the two points above are what the second call needs (x86
> will flush everything).
> 
> The other two kvm_arch_flush_shadow in kvm_mmu_notifier_release and
> kvm_destroy_vm must remove all sptes obviously.

Will these be the only remaining call sites for kvm_arch_flush_shadow?
I can see an optimization here if no vcpus are running (which should
be the case) -- I can set a flag to say that the hardware page table
is completely invalid, and if any vcpu ever does try to run again,
clear out the hardware page table and flush all TLBs at that point.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html